Full article: MS_HGNN: a hybrid online fraud detection model to alleviate graph-based data imbalance

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Online transaction fraud has become increasingly rampant due to the convenience of mobile payment. Fraud detection is critical to ensure the security of online transactions. With the development of graph neural network, researchers have applied it to the field of fraud detection. The existing fraud detection methods will solve the class imbalance by sampling, but they do not fully consider the various imbalances in the heterogeneous graph, and the data imbalance will directly affect the performance of the model. This work proposes a hybrid graph neural network model for online fraud detection to address this issue. The three types of imbalance in online transactions are feature imbalance, category imbalance, and relation imbalance, and they are all addressed in the proposed model. The entities with the feature most closely related to the fraudsters will be determined for the feature imbalance, and samples will be taken for further identification in the subsequent training phase. The hybrid model then uses under-sampling in combination with the long-distance sampling to find nodes with high similarity of features for the category imbalance. Finally, we propose a reward/punishment mechanism based on reinforcement learning for relation imbalance, which uses the threshold created by training as the sampling weight between relations. This paper conducts experiments on the public datasets Amazon and Yelp. The experimental results show that the model proposed is 5.61% higher than the best model in the comparison model on Amazon dataset, and 1.58% higher on Yelp dataset.

Keywords:

1. Introduction

Mobile payment has penetrated into various fields, including finance (Jiang et al., Citation2021; Reurink, Citation2016), medical treatment (Waghade & Karandikar, Citation2018), and insurance (Wang & Xu, Citation2018, January) with the advancement of Internet technology and intelligent mobile devices. Mobile payment facilitates online transactions while also lowering the cost of several potentially fraudulent activities. A fraudster constantly bypasses the anti-fraud system for deception by pretending to be a benign entity, similar to fraud detection in copyright protection (Liang et al., Citation2020) and intrusion detection (Liang, Xiao et al., Citation2021). This situation severely obstructs online transactions and results in financial losses for users who participate in them.

Fraud detection should meet real-time performance, stability, and interpretability criteria (Hameed, Citation2022). Traditional techniques combine previous data with stream processing to build cumulative features. The classification algorithm then trains the generated features to create the detection model. However, fraudsters may alter their methods to escape detection by the risk control mechanism. This situation will have an influence on fraud detection accuracy because the entities in the real world are relative to one another. Even if a fraudster changes his fraud methods, he will not be able to collect all the relations. Accordingly, the other entities in the same network will have difficulties simultaneously evading the risk control with the same operations. The graph-based model is a representation of real-world entities and their relations. The real-world entities are generally regarded as nodes. The relational connections between entities are expressed as edges in the graph. Accordingly, the information feature carried by each entity are node features. Moreover, the relation information between two nodes is regarded as an edge feature, and the different relationships between two entities are expressed as edge weights. The graph structure has been widely used in fraud detection models in recent years because of its good performance in fraudster identification.

Most previous fraud detection models utilise the graph neural network (GNN) model for fraud detection but rarely focus on data imbalance problems. However, data imbalance is closely related to the aggregation process of the GNN, which directly affects the classification accuracy of the designed classifier. Accordingly, selecting appropriate neighbours for aggregation is crucial. Therefore, we categorise data imbalance into three types when analysing it in fraud detection.

(1) Category imbalance: The number of fraudsters in the fraud detection task will be significantly fewer than the number of benign entities, as demonstrated in Figure . For example, only 20% of users are tagged as fraudulent nodes in the Amazon fraud dataset. Additionally, only 14.5% of the comments are flagged as spam in the Yelp public dataset.

Figure 1. Data imbalance graph structure.

(2) Feature imbalance: The fraudster can disguise his actions as those of a benign entity. The fraudsters' features are similar to those of a benign entity by introducing noise or linking several benign entities, as shown in the feature vector in Figure . For example, a spammer will post spam using benign accounts and connect with multiple benign users to hide. Majority of the GNN-based models are unable to provide highly accurate predictions following aggregation due to feature imbalance.

(3) Relation imbalance: In real life, two entities might have a variety of relations. Each relation has different effects on the entity. For example, in the three relations in Figure , relation 2 has a higher number than relations 1 and 3. Relation 2 is more important for most nodes in the network.

Accuracy is relatively poor when employing only GNN to capture these fraudsters due to the adaptability of real-world fraudsters. We propose a hybrid online fraud detection model for alleviation by considering the aforementioned imbalance problems. The category imbalance problem in training is solved using a neighbour sampler based on the under-sampling method. Moreover, a feature information extractor is constructed by using the entity features to obtain the distance scores. Reinforcement learning is used as a reward and punishment mechanism to determine the sample weight of each relation. Furthermore, the entire graph is sampled with high-order neighbours using the sampling weight. In this paper,the designed model is named MS_HGNN(Mixed Sampling Heterogeneous Graph Neural Network). The effectiveness of the proposed model in online fraud detection is demonstrated by experiments using public fraud detection datasets.

The remainder of this paper is organised as follows. Section 2 analyses the existing models. Section 3 introduces the mathematical definitions and descriptions of the proposed model. Section 4 illustrates the framework and flow of the proposed model, and Section 5 evaluates and analyses experimental results. Finally, it summarises and prospects the paper and suggests future research directions.

2. Related work

Many fraud detection algorithms have been reported in the field of data mining. The authors in Awoyemi et al. (Citation2017) utilised machine learning to address the challenge of fraud detection. The naive Bayes, K-nearest neighbour, and logistic regression are adopted to address credit card fraud (Ghoshdastidar et al., Citation2022). The machine learning algorithm is relatively traditional, and its efficiency is not very high. In Fiore et al. (Citation2019),the generative adversarial network was utilised to generate a small number of class instances. This network is combined with the original dataset to form an enhanced training dataset for better effectiveness of the classifier. However, the generative antagonism network is likely to introduce data noise, resulting in worse training results of the model.

GNN-based models have been developed in variety of fields, including image identification (Liang, Long et al., Citation2021), recommendation systems (Chen et al., Citation2021), and intrusion detection (Long et al., Citation2022), due to their good ability in data comprehension and cognition. The authors in Wang et al. (Citation2020) proposed a GAT-based semi-supervised GNN, called SemiGNN. However, when node behaviour changes or new fraud occurs, its flexibility may be relatively poor. This framework is capable of detecting fraud in multi-view labelled and unlabelled data. In Liu et al. (Citation2018), a GEM model learned the importance of nodes with various types by utilising the attention mechanism, while the hierarchical attention mechanism is used to set the model (Hu et al., Citation2019).The authors in Zhang, Fan et al. (Citation2019) designed the Player2Vec model by combining the attention mechanism with the GNN. The attention mechanism can improve the classification effect of the model to a certain extent, but it will increase the corresponding time cost.

Data imbalance has attracted widespread attention in the field of fraud detection in recent years. The authors (Fatima et al., Citation2021) proposed three feature selection algorithms, namely, RONS, ROS, and ROA, which are constructed by sparse feature selection to minimise overlap and perform binary classification. This method is helpful to deal with the problem of missing features. The authors (Zhang, Liu et al., Citation2019) focussed on class imbalance and sample size. The four factors of class separability and intra-class concept present a comprehensive model of clustering tree.This model solves the class imbalance problem, but does not fully consider other data imbalance problems in fraud detection. The authors in Kavitha and Suriakala (Citation2018) created a meta-classifier based on the real-time tree using random forest and decision tree. There are also some sampling methods based on K-means clustering, and genetic algorithm (Benchaji et al., Citation2018) and machine learning models based on Support Vector Machine(SVM) recursive feature elimination and hyperparameter optimisation (Rtayli & Enneya, Citation2020). The machine learning method may make a small contribution to the overall performance improvement. In the field of GNNs, many models for solving the data imbalance problem have also been proposed. The authors in Liu et al. (Citation2020) proposed a method of inconsistency score for filtering neighbours for the problems of inconsistent context, inconsistent features, and inconsistent relations. The label-balanced sampler used in Liu et al. (Citation2021) selects nodes and edges to construct sub-graphs for small-batch training, which is referred to as a selection GNN. In the research of graph neural network, there are better solutions to the problem of data category imbalance. However, the existing research has not comprehensively considered various data imbalance phenomena in heterogeneous graph networks.

The performance of the classifier in these models is determined by the GNN aggregation result. An appropriate aggregation neighbour must be selected because of its significant function in the aggregation. Accordingly, this work aims to find the optimal aggregation neighbours in the training process and proposes a fraud detection model with good performance. Data imbalance can be categorised into three types: category imbalance, feature imbalance, and relation imbalance (Buda et al., Citation2018; Ebenuwa et al., Citation2019). Category imbalance means that the number of fraudsters is significantly smaller than the number of normal entities. In the category imbalance, the fraudsters can imitate normalcy by adding noise or connecting to several normal entities to make it similar to normal entities. The GNN-based fraud detection model cannot achieve a high detection accuracy with the participation of fraudsters. Various relations exist among entities. Each relation has a unique effect on the entities. Accordingly, the importance of relations should be considered as this implies a relation imbalance. This work proposes a hybrid GNN-based fraud detection model to address the above-mentioned issues. This model includes a neighbour sampler and a feature extractor. The under-sampling method is used to solve the issue of category imbalance. The distance scores are generated on the basis of the features. The sampled weights for each relation can be created for high-order neighbour sampling via deep reinforcement. The performance of this work is evaluated on the public fraud detection dataset. The evaluation results show the effectiveness of the proposed model.

3. Mathematical model of MS_HGNN

Let $G = {V, X, {A_{r}} |_{r = 1}^{R}, Y}$ be the multi-relation graph of the real-world dataset. $V = {v_{i} | i \in {1, 2, \dots, n}}$ represents the node collection, and $v_{i}$ is the ith node in the graph. $X = {x_{i} | i \in {1, 2, \dots, n}}$ denotes the feature collection $x_{i} \in R^{n \times d}$ , in which $x_{i}$ is the feature of the ith node with the length of d. $A = {A_{i} | i \in {1, 2, \dots, r}}$ represents the adjacent matrix of nodes for various relations. Herein, we have $r \in {1, 2, \dots, R}$ . $Y = {y_{1}, y_{2}, \dots, y_{n}}$ denotes the label collection of all nodes.

3.1. Hybrid sampling model

To alleviate the imbalance problem, this work proposes a hybrid sampling model, as shown in Figure . This model includes three stages. Firstly, the features of the selected dataset are extracted. After that, the hybrid sampler which combines the under-sampling and long-distance sampling techniques is used to collect the similar nodes with the given sample. After that, the reinforcement learning is used to calculate the distance score and feature similarity. On this basis, the threshold between relationships can be calculated. Finally, the model aggregates the relationships.

Figure 2. The flow of hybrid sampling model.

3.1.1. Long-distance sampling

The classes of graph-based data are often imbalanced. A small number of some classes will cause inaccurate classification results in the binary classification problem. If the sampling approach of classification or the selected evaluation metric is unsuitable, then the model's overall performance cannot be objectively evaluated. Here, we adopt the under-sampling and Long-distance sampling. This algorithm performs sampling from the normal nodes without replacement. The number of samples comes close to matching the number of fraud nodes. In this case, the model can better predict the fraud nodes. Here, the training sample set is represented as $N_{(v)}$ .

Fraud, benign, and sampled nodes are presented in Figure . The main node $V_{0}$ is the fraud node. If we utilise first-order neighbour sampling, then only the fraud nodes can be sampled. The second sampling method is utilised to sample the main node's second and third order neighbours. In this way, more nodes that are similar to the main node can be sampled by evaluating the similarity of features, such as the second-order neighbour $V_{18}$ and the third neighbour $V_{16}$ . As a result, the model will be thoroughly trained and achieve high training accuracy.

Figure 3. Long-distance sampling.

The fraud nodes can pretend to be normal and connect to the normal nodes. Here, long-distance sampling is utilised to predict the relations among nodes. The k-order neighbours of the sampled training nodes are used for the subsequent neighbour sampling. More nodes similar to the major node will be found using the k-order neighbours using Equation (Equation1(1) $P_{(k)}^{(r)} = A^{(r) k},$ (1) ): (1) $P_{(k)}^{(r)} = A^{(r) k},$ (1) then, we can calculate the k-order neighbours under the relation r. The final k-order adjacent matrix for sampling can be calculated using Equation (Equation2(2) $A_{(k)}^{(r)} = \sum_{i = 1}^{k} P_{(k)}^{(r)},$ (2) ): (2) $A_{(k)}^{(r)} = \sum_{i = 1}^{k} P_{(k)}^{(r)},$ (2) where $P_{(k)}^{(r)}$ is the k-order adjacent matrix under relation r; and $A_{(k)}^{(r)}$ represents the adjacent matrix of the 1-order to k-order neighbour, which will be used for subsequent neighbour sampling.

3.1.2. Neighbor sampling

The multi-layer preceptor (MLP) is utilised as the feature extractor in the neighbour sampling. The feature will be used for calculating the distance between neighbours and be the input of the subsequent GNN. The distance between the training node and the neighbour node can be calculated using Equation (Equation3(3) $D_{(v, u)}^{l} = σ (W_{2} (W_{1} h_{v}^{(l - 1)})) - σ (W_{2} (W_{1} h_{u}^{(l - 1)})),$ (3) ): (3) $D_{(v, u)}^{l} = σ (W_{2} (W_{1} h_{v}^{(l - 1)})) - σ (W_{2} (W_{1} h_{u}^{(l - 1)})),$ (3) where W corresponds to linear transformation (ignore the bias term for simplicity), σ indicates activation function.

Shorter distance represents higher similarity to the major node, which will be sampled with a high probability. The average distance score of each epoch can be calculated by using Equation (Equation4(4) $S_{(r)}^{(e)} = \frac{\sum_{v \in N_{(v)}} D_{(r)}^{(e - 1)} (Z_{v}, h_{v}^{r})}{| | N_{(v)} | |},$ (4) ): (4) $S_{(r)}^{(e)} = \frac{\sum_{v \in N_{(v)}} D_{(r)}^{(e - 1)} (Z_{v}, h_{v}^{r})}{| | N_{(v)} | |},$ (4) where $D_{(r)}^{(e - 1)} (Z_{v}, h_{v}^{r})$ represents using Equation (Equation3(3) $D_{(v, u)}^{l} = σ (W_{2} (W_{1} h_{v}^{(l - 1)})) - σ (W_{2} (W_{1} h_{u}^{(l - 1)})),$ (3) ) to calculate the distance between $h_{v}^{r}$ of each node v under relation r after each epoch and $Z_{r}$ after the aggregation of node v between relations. $| | N_{v} | |$ denotes the number of training nodes, and $S_{(r)}^{(e)}$ represents the similarity between the features under the relation of all training nodes and that after aggregating all relations.

This work uses the most classic example of reinforcement learning Bernoulli Multi-armed Bandit (Vermorel & Mohri, Citation2005). The reinforcement learning mechanism is embedded in the GNN to optimise the GNN training. The training data of the graph neural network is used to determine the number of neighbour samples under each relation in the next training, initialise the proportion of training under each relation to $Q = q_{1}, q_{2}, \dots, q_{r}$ , and set two states, namely, $S_{1} : S_{r}^{(e)} - S_{r}^{(e - 1)} \geq 0$ and $S_{1} : S_{r}^{(e)} - S_{r}^{(e - 1)} < 0$ , which correspond to two actions, such as $a_{S_{1}} = 0.02$ and $a_{S_{2}} = 0.03$ , respectively. The reward function is defined as $f (s, a)$ in Equation (Equation5(5) $f (s, a) = {\begin{aligned} 1, S_{1} \\ - 1, S_{2} . \end{aligned}$ (5) ): (5) $f (s, a) = {\begin{aligned} 1, S_{1} \\ - 1, S_{2} . \end{aligned}$ (5) Finally, q is generated by the reward function $q^{e} = q^{e - 1} + f^{(e)} (s, a) a_{s}$ , which is also used to update the value of $q^{e}$ under each relation.

3.2. GNN aggregation model

In the model of HAN (Wang, Ji et al., Citation2019), the heterogeneous graphs are transformed into multiple homogeneous graphs, and the training is individually carried out on each graph. Finally, the final features are created by combining the features of various nodes under different relations. In Wang, Ji et al. (Citation2019), the node and semantic attention mechanisms are utilised to capture neighbours-level and meta-path-based information, respectively. Using the attention mechanism would increase the computation complexity because we processed the samples in advance; hence, the attention mechanism is not used in our proposed method. Therefore, we apply message passing on each relation-induced subgraph as shown in Equations (Equation6(6) $\begin{aligned} h_{N_{(u)}}^{l + 1} & = M e a n ({M L P (h_{u}^{l}), \forall u \in N_{(u)}}), \end{aligned}$ (6) ) and (Equation7(7) $\begin{aligned} h_{v}^{l + 1} & = c o n c a t ({h_{v}^{l}, h_{N_{(u)}}^{l + 1}}, \forall v \in N_{(v)}), \end{aligned}$ (7) ): (6) $\begin{aligned} h_{N_{(u)}}^{l + 1} & = M e a n ({M L P (h_{u}^{l}), \forall u \in N_{(u)}}), \end{aligned}$ (6) (7) $\begin{aligned} h_{v}^{l + 1} & = c o n c a t ({h_{v}^{l}, h_{N_{(u)}}^{l + 1}}, \forall v \in N_{(v)}), \end{aligned}$ (7) where $h_{N_{(u)}}^{l + 1}$ represents the aggregation of the neighbouring nodes of node v, and $h_{v}^{l + 1}$ is the aggregation of the feature of the neighbouring node and the feature of the upper-level node $h_{v}^{l}$ . Then, we aggregate the features of each relation of each node v and denote the feature of the final node v as $z_{v}^{l}$ , as shown in Equation (Equation8(8) $z_{v}^{l} = R e L U (M e a n (z_{v}^{l - 1} ◯ + {q_{r}^{l - 1} h_{v, r}^{l}} |_{r = 1}^{R}, \forall v \in N_{(v)})) .$ (8) ): (8) $z_{v}^{l} = R e L U (M e a n (z_{v}^{l - 1} ◯ + {q_{r}^{l - 1} h_{v, r}^{l}} |_{r = 1}^{R}, \forall v \in N_{(v)})) .$ (8) where $◯ +$ represents splicing operation. After GNN aggregation, the CrossEntropyLoss function is used in optimisation. CrossEntropyLoss combines the LogSoftmax and the NLLLoss functions, as shown in Equation (Equation9(9) $L o s s = - \sum_{v \in N_{(v)}} y_{v} * l o g (s o f t m a x (M L P (z_{v}))) .$ (9) ): (9) $L o s s = - \sum_{v \in N_{(v)}} y_{v} * l o g (s o f t m a x (M L P (z_{v}))) .$ (9)

4. MS_HGNN model for fraud detection

The fraud detection problem is to distinguish between the normal nodes and the fraud nodes in the multi-relation graph. The nodes in the dataset are labelled with {0,1}. Here, the normal and fraud nodes are represented by 0 and 1, respectively. In this case, the problem belongs to the semi-supervised binary classification. The labelled nodes in the multi-relation are used for model training, thus predicting the unlabelled nodes.

4.1. System framework

In Figure , the proposed model consists of the hybrid sampling model and the GNN model. The hybrid sampler mainly involves the long-distance sampling and the neighbour sampling, depicted as follows.

Figure 4. The structure of the MS_HGNN model.

First, the feature information of the selected dataset is extracted. The under-sampling and the long-distance sampling algorithms are integrated to collect the nodes similar to the neighbours. Finally, the RL-based reward/punishment mechanism is used for sampling the weight between the relations. In this case, the relations among the training nodes are aggregated into clusters.

4.2. Fraud detection algorithm

In this section, we describe the critical algorithm flow of the MS_HGNN model. We focus on two algorithms, namely, hybrid sampling and GNN aggregation.

4.2.1. Hybrid sampling algorithm

The hybrid sampling algorithm generates the final set of sampled neighbour nodes and the next neighbour sampling threshold. The inputs include $G = {V, X, {A_{r}} |_{r = 1}^{R}, Y}$ , K, $Q^{0}$ , a collection of nodes for training obtained after under-sampling $N_{v}$ . The output is the final sampled neighbour node-set $N_{u}$ and the next neighbour sampling threshold $Q^{e}$ . The detailed flow is described as follows and the pseudocode is shown in Algorithm 1.

Step1: We set up an MLP as a feature extractor, which will calculate the feature similarity and select neighbours based on aggregation. The extractor will be inputted into the GNN as the feature of the node.

Step2: Then, we use Equations (Equation1(1) $P_{(k)}^{(r)} = A^{(r) k},$ (1) ) and (Equation2(2) $A_{(k)}^{(r)} = \sum_{i = 1}^{k} P_{(k)}^{(r)},$ (2) ) to perform long-distance sampling on the nodes under each relation, that is, the high-order neighbours of the nodes, and finally, obtain the high-order adjacency matrix $A_{k}^{r}$ .

Step3: We perform neighbour sampling on this batch of nodes according to the $Q^{e - 1}$ obtained from the previous neighbour sampling to obtain $N_{u}^{e}$ .

Step4: The similarity between the different relations of each node can be calculated by using Equation (Equation3(3) $D_{(v, u)}^{l} = σ (W_{2} (W_{1} h_{v}^{(l - 1)})) - σ (W_{2} (W_{1} h_{u}^{(l - 1)})),$ (3) ). Finally, reinforcement learning participates in the process to obtain the weight between the relations $q^{e}$ .

4.2.2. MS_HGNN algorithm

This algorithm generates the vector representation of each node. The inputs include $G = {V, X, {A_{r}} |_{r = 1}^{R}, Y}$ , K, $Q^{0}$ , initialisation parameters $Q = {q_{1}, q_{2}, \dots, q_{r}}$ , the number of model layers L, the number of epochs E, and a collection of nodes for training obtained after under-sampling $N_{(v)}$ . The output is the vector representation of each node v in $N_{(v)}$ .

Step1: Calculate $N_{(u)}$ with the above neighbour sampling algorithm.

Step2: Aggregate the neighbour $N_{(u)}$ sampled by the neighbour sampler for each $v \in N_{(v)}$ and R to obtain $h_{N_{(u)}}^{l}$ with Equation (Equation6(6) $\begin{aligned} h_{N_{(u)}}^{l + 1} & = M e a n ({M L P (h_{u}^{l}), \forall u \in N_{(u)}}), \end{aligned}$ (6) ).

Step3: Aggregate the training node and the corresponding neighbour node to obtain $h_{v}^{l}$ with Equation (Equation7(7) $\begin{aligned} h_{v}^{l + 1} & = c o n c a t ({h_{v}^{l}, h_{N_{(u)}}^{l + 1}}, \forall v \in N_{(v)}), \end{aligned}$ (7) ). Repeat Step3 for each $u \in N_{(u)}$ .

Step4: Aggregate each relation of the training node to obtain $Z_{v}$ with Equation (Equation8(8) $z_{v}^{l} = R e L U (M e a n (z_{v}^{l - 1} ◯ + {q_{r}^{l - 1} h_{v, r}^{l}} |_{r = 1}^{R}, \forall v \in N_{(v)})) .$ (8) ).

The final vector representation of all nodes in $N_{(v)}$ can be generated using the above-mentioned steps. The pseudocode is shown in Algorithm 2

5. Experiments and analysis

5.1. Dataset

We use the Amazon and Yelp (Dou et al., Citation2020) datasets for performance evaluation in fraud detection. The Amazon dataset includes product reviews under the Amazon musical instrument category. It marks more than 80% of users who help in voting as benign nodes and less than 20% of users who assist in voting as fraudulent nodes. Each node contains 25 features. The Amazon dataset regards users as nodes in the graph and divides the connections between nodes into three relations: (1) U-P-U indicates that two connected users have viewed the same product. (2) U-S-U indicates that the two connected users have at least a one-star rating within a week. (3) U-V-U: It connects the top 5% of users with mutual evaluation text similarity (measured by TF-IDF) among all users. The Yelp dataset aims to collect the hotel and restaurant reviews from Yelp.com. The nodes in the graph of this dataset are reviewed using 100 dimensions features. The relations among nodes include three types: (1) R-U-R indicates the reviews from the same users; (2) R-T-R indicates the reviews for the same product in the same month; (3) R-S-R indicates the reviews for the same product with the same level. The composition of the graph-based Amazon and Yelp review datasets is described in Table .

Table 1. The composition of graph-based Amazon and Yelp dataset.

Download CSV Display Table

5.2. Evaluation metrics and reference models

The AUC and Recall are utilised as evaluation metrics to measure the performance of all proposed models. AUC is the most commonly used evaluation index to measure the pros and cons of a two-class model, which will not be affected by the influencing factors of category imbalance. AUC is the area under the ROC curve, which depicts the relationship between true positive rate (TPR) and false-positive rate (FPR). The definition of Recall is presented in Equation (Equation11(11) $\begin{aligned} R e c a l l & = \frac{T P}{T P + F N} . \end{aligned}$ (11) ). (10) $\begin{aligned} F P R & = \frac{F P}{T N + F P} . \end{aligned}$ (10) (11) $\begin{aligned} R e c a l l & = \frac{T P}{T P + F N} . \end{aligned}$ (11) The performance of the proposed model is compared with the basic GNN method and several recently proposed GNN-based fraud detection models for better evaluation. The selected baselines are as follows.

GCN Kipf and Welling (Citation2016).) is a generalisation of convolutional neural networks on graphs, by utilising the local first-order approximation of atlas convolution to determine the structure of the convolutional network. The applicability of graph convolution is relatively wide, and it is suitable for nodes and graphs of any topology.
GAT Velikovi et al. (Citation2017) indicates that the attention mechanism is used in the aggregation process of GNNs.
GraphSAGE Hamilton et al. (Citation2017) is similar to the method proposed in this work, where the neighbours are sampled before neighbour aggregation, and uses a fixed number of neighbours sampling.
DotGat Vinayaka and Jaidhar (Citation2021) uses GCN and the attention method to aggregate graph data.
TAG Du et al. (Citation2017).) is a topological adaptive graph convolution based on a multi-hop neighbourhood.
RGCN Chen et al. (Citation2019) is a hidden state similar to a recurrent neural network. The node state is a representation of the hidden state and neighbour node information.
Player2Vec Zhang, Fan et al. (Citation2019) divides the multi-view network into multiple single-view networks. It is a fraud detection model based on GCN and attention mechanisms.
FdGars Wang, Wen et al. (Citation2019) is the first graphical convolutional network method for fraudster detection in an online application review system.
GraphConsis Liu et al. (Citation2020) is a fraud detection model proposed for the problem of graph data inconsistency.

In above models, GCN, GAT, GraphSAGE and DotGat belong to single-relation. The others are multiple relation.

5.3. Evaluation results

The proposed MS_HGNN is designed and implemented using Pytorch 1.6.0 and Python3.7. The experiments are executed on an 8-core server, 40 G memory, and the Operating System is Ubuntu 18.04 server. Given that fraudsters account for a relatively small percentage in the Amazon dataset, we use the under-sampling for training set sampling and small batch training to improve training efficiency. The parameters in GNN are optimised using Adam, the learning rate is set to 0.005, Epoch = 50, batch_size = 256, dropout = 0.5, and Q is initialised to $Q = [0.6, 0.6, 0.6]$ .

We compare the performance of the proposed model with the referenced models using the above-mentioned parameters. The types of comparative models can be classified into single-relation and multi-relation. We merge the three relations into isomorphic graphs for training the single-relation models, such as GCN, GAT, GraphSAGE and DotGat. Multiple relations are utilised for training other multi-relation models. Moreover, the smaller training ratios (10%, 20%, and 40%) are chosen for these models to improve the authenticity. The results are listed in Table and .

Table 2. The model performance on Amazon under various training ratios.

Download CSV Display Table

Table 3. The model performance on Yelp under various training ratios.

Download CSV Display Table

In Table , the single-relation models (GCN, GAT and DotGat) achieve low performance of AUC and Recall. By contrast, GraphSAGE performs well among these single-relation models. The reason is that the models, such as GCN, GAT, and DotGat, rely on the number of training samples. The data for training are the imbalanced real-world data, resulting in insufficient training and poor performance after model training. However, GraphSAGE is trained in small batches on a large graph and has a fixed neighbourhood size. The nodes with less neighbours have less lost information. Moreover, GraphSAGE achieves relatively lower AUC and Recall when the training ratios are 10% and 20%. Nevertheless, the performance is greatly improved when the training ratio is 40%. This result demonstrates that GraphSAGE requires enough samples to train a better model. In comparison with GraphSAGE, the AUC and Recall of MS_HGNN achieve 0.9430 and 0.8849 when the training ratio is 10%, which improve by 31.11% and 28%, respectively. When the training ratio is 40%, the AUC and Recall of MS_HGNN increase by 5.61% and 4.13% over GraphSAGE. The analysis shows that the MS_HGNN utilises small batches, such as GraphSAGE. Moreover, the reinforcement learning is used to train the sampling ratio. This mechanism can address the sampling size of the neighbourhood and reduce the lost information of nodes.

Given that the remaining models belong to the multiple-relation type, the multiple relations are considered in the training process. Here, FdGars and GraphConsis are proposed to address the fraud problems. Therefore, both models perform better than the single-relation models. However, models TAG and Player2Vec perform worse in terms of AUC and Recall.The reason for this might be that neither model is appropriate for the datasets used in this work. Therefore, they do not perform well. Among the multi-relation models, the MS_HGNN introduces the scoring mechanism in each iteration and reinforcement learning to choose the neighbours for training. This mechanism effectively strengthens the training effects. In contrast with GraphConsis, which performs best in the referenced multiple-relation models, the MS_HGNN improves AUC by 6.15% maximum. The performance on Recall is also higher than that of GraphConsis. By analysis, the proposed hybrid model and the way to generate the relation weight have greatly improved the imbalance issue, thus achieving effective performance.

We also select the Yelp dataset to evaluate the performance on different datasets. The results are listed in Table . The performance of MS_HGNN is not as good as that on the Amazon dataset. However, it is also superior to the comparative models.

We compare the proposed MS_HGNN model to several models when Train% = 40% and epoch = 50 for better evaluation. The curves of AUC and Recall with the growth of epochs are shown in Figure (a,b) respectively.

Figure 5. Performance evaluation with various epochs on Amazon dataset. (a) AUC. (b) Recall.

The Amazon dataset shows that the three models, namely, FdGars, GraphSAGE, and Player2Vec, have no evident changes in the AUC with the growth of Epoch. A model can be immediately trained because the model parameters are relatively small. Nevertheless, the trained AUC is not very high. GraphConsis gradually increases and tends to be balanced. Figure (a,b) demonstrate that models of GraphSAGE, MS_HGNN, and GraphConsis perform better in AUC and Recall, indicating that they are more suitable for fraud detection training than the other three models. The model proposed in this work can also achieve higher AUC and Recall with fewer epochs and is higher than other models.

We also evaluate the performance on the Yelp dataset, as shown in Figure , depicting that models RGCN, GraphSAGE, and GraphConsis are extremely unstable. By contrast, FdGars, Player2Vec, and the proposed model MS_HGNN are stable. This trend can also be seen from the Recall indicator of Yelp. Accordingly, the availability of RGCN, GraphSAGE, and GraphConsis on the Yelp dataset is not high. The FdGars and Player2Vec models are proposed for fraud detection. Although the performance is consistent, the AUC and Recall indicators are far lower than the MS_HGNN model. The performance of all models is not satisfying because the Yelp dataset has more nodes and complex relationships. Although the performance of our model is not as good as on the Amazon dataset, it is the highest and most stable on AUC and Recall compared with the other models.

Figure 6. Performance evaluation with various epochs on Yelp dataset. (a) AUC. (b) Recall.

6. Conclusions and future work

Data imbalance is a critical factor that affects the accuracy of fraud detection. This work investigates three types of data imbalance in the graph-based dataset, namely, category imbalance, feature imbalance, and relation imbalance. In this work, we propose a mixed sampling heterogeneous graph neural network model to alleviate the problem of data imbalance. In this model, the entities with the feature most closely related to the fraudsters are sampled for further identification in the subsequent training process, thus alleviating the feature imbalance. Meanwhile, a hybrid sampler is designed and integrated into each iteration of GNN aggregation for the category imbalance problem. In relation to imbalance, a reward/punishment mechanism is proposed based on reinforcement learning. This mechanism utilises the threshold generated by training as the sampling weight between relations. The effectiveness of our proposed model has been demonstrated after a large number of experiments. In the hybrid sampler, long-distance sampling is used to improve the accuracy of detecting fraud nodes by sacrificing the time cost. The next work will further consider the improvements of model performance. Meanwhile, in this work, we mainly test the performance of the proposed model on two datasets. Evaluating the generalisation for other applications is insufficient.In addition, the classic open data set is used in the experiment of this paper, so there is a lack of relevant research on data acquisition. In practical applications, the unbalance problems in the process of data acquisition, filtering, and processing will be more complicated. How to collect and process data to reduce invalid information and obtain better graph topology and feature information is a crucial problem. Therefore, in our next work, the unbalance problem of data acquisition can be studied.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

This work was supported by the National Natural Science Foundation of China [grant number 61902167].

References

Awoyemi, J. O., Adetunmbi, A. O., & Oluwadare, S. A. (2017). Credit card fraud detection using machine learning techniques: A comparative analysis. In International Conference on Computing Networking and Informatics (ICCNI) (pp. 1–9).
Google Scholar
Benchaji, I., Douzi, S., & Ouahidi, B. E. (2018). Using genetic algorithm to improve classification of imbalanced datasets for credit card fraud detection (pp. 220–229). Springer.
Google Scholar
Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106, 249–259. https://doi.org/10.1016/j.neunet.2018.07.011
PubMed Web of Science ®Google Scholar
Chen, J., Hou, H., Gao, J., Ji, Y., & Bai, T. (2019). RGCN: Recurrent graph convolutional networks for target-dependent sentiment analysis. Knowledge Science, Engineering and Management. 2019: 667–675.
Google Scholar
Chen, X., Liang, W., Xu, J., Wang, C., Li, K. C., & Qiu, M. (2021). An efficient service recommendation algorithm for cyber-physical-social systems. IEEE Transactions on Network Science and Engineering, 9(6), 3847–3859. https://doi.org/10.1109/TNSE.2021.3092204
Web of Science ®Google Scholar
Dou, Y., Liu, Z., Sun, L., Deng, Y., & Yu, P. S. (2020). Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In CIKM'20 (pp. 1–10).
Google Scholar
Du, J., Zhang, S., Wu, G., Moura, J., & Kar, S. (2017).Topology adaptive graph convolutional networks[J]. arXiv preprint arXiv:1710. 10370.
Google Scholar
Ebenuwa, S. H., Sharif, M. S., Alazab, M., & Al-Nemrat, A. (2019). Variance ranking attributes selection techniques for binary classification problem in imbalance data. IEEE Access, 7, 24649–24666. https://doi.org/10.1109/ACCESS.2019.2899578
Web of Science ®Google Scholar
Fatima, E. B., Omar, B., Abdelmajid, E. M., Rustam, F., & Choi, G. S. (2021). Minimizing the overlapping degree to improve class-imbalanced learning under sparse feature selection: Application to fraud detection. IEEE Access, 9, 28101–28110. https://doi.org/10.1109/Access.6287639
Web of Science ®Google Scholar
Fiore, U., Santis, A., Perla, F., Zanetti, P., & Palmieri, F. (2019). Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences, 479, 448–455.
Web of Science ®Google Scholar
Ghoshdastidar, K., Jurgovsky, J., Siblini, W., & Granitzer, M. (2022). NAG: Neural feature aggregation framework for credit card fraud detection. Knowledge and Information Systems, 64(3), 831–858. https://doi.org/10.1007/s10115-022-01653-0
Web of Science ®Google Scholar
Hameed, I. A. (2022). A machine learning and blockchain based efficient fraud detection mechanism. Sensors, 22(19), 7162. https://doi.org/10.3390/s22197162
PubMed Web of Science ®Google Scholar
Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 1025–1035).
Google Scholar
Hu, B., Zhang, Z., Shi, C., Zhou, J., & Qi, Y. (2019). Cash-out user detection based on attributed heterogeneous information network with a hierarchical attention mechanism. Proceedings of the AAAI Conference on Artificial Intelligence. 33(01): 946–953.
Google Scholar
Jiang, J., Ni, B., & Wang, C. (2021). Financial fraud detection on micro-credit loan scenario via fuller location information embedding. In WWW '21: The Web Conference (pp. 238–246).
Google Scholar
Kavitha, M., & Suriakala, M. (2018). Real time credit card fraud detection on huge imbalanced data using meta-classifiers. In 2017 International Conference on Inventive Computing and Informatics (ICICI).
Google Scholar
Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
Google Scholar
Liang, W., Long, J., Li, K. C., Xu, J., Ma, N., & Lei, X. (2021). A fast defogging image recognition algorithm based on bilateral hybrid filtering. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 17(2), 1–16. https://doi.org/10.1145/3391297
Web of Science ®Google Scholar
Liang, W., Xiao, L., Zhang, K., Tang, M., He, D., & Li, K. C. (2021). Data fusion approach for collaborative anomaly intrusion detection in blockchain-based systems. IEEE Internet of Things Journal, 9(16), 14741–14751. https://doi.org/10.1109/JIOT.2021.3053842
Web of Science ®Google Scholar
Liang, W., Zhang, D., Lei, X., Tang, M., Li, K. C., & Zomaya, A. (2020). Circuit copyright blockchain: Blockchain-based homomorphic encryption for IP circuit protection. IEEE Transactions on Emerging Topics in Computing, 9(3), 1410–1420. https://doi.org/10.1109/TETC.2020.2993032
Web of Science ®Google Scholar
Liu, Y., Ao, X., Qin, Z., Chi, J., & He, Q. (2021). Pick and choose: A GNN-based imbalanced learning approach for fraud detection. In WWW '21: The Web Conference (pp. 3168–3177).
Google Scholar
Liu, Z., Chen, C., Yang, X., Zhou, J., & Song, L. (2018). Heterogeneous graph neural networks for malicious account detection. In CIMK'18 (pp. 2007–2085).
Google Scholar
Liu, Z., Dou, Y., Yu, P., Deng, Y., & Peng, H. (2020). Alleviating the inconsistency problem of applying graph neural network to fraud detection. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1569–1572).
Google Scholar
Long, J., Liang, W., Li, K. C., Wei, Y., & Marino, M. (2022). A regularized cross-layer ladder network for intrusion detection in industrial internet of things. IEEE Transactions on Industrial Informatics, 19(2), 1747–1755. https://doi.org/10.1109/TII.2022.3204034
Web of Science ®Google Scholar
Reurink, A. (2016). Financial fraud: A literature review. Journal of Economic Surveys, 16(5), 1292–1325. https://doi.org/10.1111/joes.12294
Google Scholar
Rtayli, N., & Enneya, N. (2020). Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization. Journal of Information Security and Applications, 55, 102596.
Web of Science ®Google Scholar
Velikovi, P., Cucurull, G., Casanova, A., Romero, A., & Bengio, Y. (2017). Graph attention networks (pp. 1–11).
Google Scholar
Vermorel, J., & Mohri, M. (2005). Multi-armed bandit algorithms and empirical evaluation. In European Conference on Machine Learning (pp. 437–448).
Google Scholar
Vinayaka, K. V., & Jaidhar, C. D. (2021). Android malware detection using function call graph with graph convolutional networks. In 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C), IEEE.
Google Scholar
Waghade, S. S., & Karandikar, A. M. (2018). A comprehensive study of healthcare fraud detection based on machine learning. International Journal of Applied Engineering Research, 13(6), 4175–4178.
Google Scholar
Wang, D., Lin, J., Cui, P., Jia, Q., Wang, Z., Fang, Y., & Qi, Y. (2020). A semi-supervised graph attentive network for financial fraud detection. In 2019 IEEE International Conference on Data Mining (ICDM) (pp. 598–607).
Google Scholar
Wang, J., Wen, R., Wu, C., Huang, Y., & Xion, J. (2019). FdGars: Fraudster detection via graph convolutional networks in online app review system. In Companion the 2019 World Wide Web Conference (pp. 310–316).
Google Scholar
Wang, X., Ji, H., Shi, C., Wang, B., Ye, Y., Cui, P., & Yu, P. S. (2019). Heterogeneous graph attention network. In The World Wide Web Conference (pp. 2022–2032).
Google Scholar
Wang, Y., & Xu, W. (2018, January). Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decision Support Systems, 105, 87–95. https://doi.org/10.1016/j.dss.2017.11.001
Web of Science ®Google Scholar
Zhang, Y., Fan, Y., Ye, Y., Zhao, L., & Shi, C. (2019). Key player identification in underground forums over attributed heterogeneous information network embedding framework. In The 28th ACM International Conference (pp. 549–558).
Google Scholar
Zhang, Y., Liu, G., Zheng, L., & Yan, C. (2019). A hierarchical clustering strategy of processing class imbalance and its application in fraud detection. In IEEE 21st International Conference on High Performance Computing and Communications (pp. 1810–1816).
Google Scholar

MS_HGNN: a hybrid online fraud detection model to alleviate graph-based data imbalance

Abstract

1. Introduction

2. Related work