830
Views
0
CrossRef citations to date
0
Altmetric
Research Article

The node importance evaluation method based on graph convolution in multilayer heterogeneous networks

, &
Article: 2229964 | Received 08 Dec 2022, Accepted 22 Jun 2023, Published online: 15 Jul 2023

ABSTRACT

Node importance evaluation is a hot issue in complex network analysis. Existing node importance evaluation methods are mainly oriented to homogeneous networks, which ignore the heterogeneity of node types and edges. We propose an MLN critical node evaluation method based on graph convolution. In this paper, we generate the feature matrix of nodes. Considering the diversity of node types in the network, we design an adapted node sampling method based on the meta path. An MLN node embedding model is constructed based on a graph convolutional network (MGC). Besides, the negative sampling technique is used to complete MGC training. Metrics of critical node evaluation are constructed by combining the node embedding vectors and local structural features to evaluate the node's importance. The experimental results show that the proposed method has better evaluation accuracy than the K-Shell algorithm (K-Shell), K-shell-based gravity model ranking algorithm (KSDG), the Page Rank algorithm in MLN (PR), influence maximization based on network embedding (IMNE) and the node ranking algorithm based on information entropy (ERM).

1. Introduction

Most real networks are usually composed of a large number of different types of nodes and edges (Shi et al., Citation2021), such as social networks, computer networks and biological networks. To better analyse such networks, most research scholars usually regard them as a homogeneous network (HON). HON does not consider the heterogeneity of object types and relationships. Therefore, networks with rich types of nodes and edges are regarded as a heterogeneous information network (HIN) as shown in Figure .

Figure 1. Illustration of HIN.

Figure 1. Illustration of HIN.

Scholars have done a lot of research on the basic problems of HIN, such as basic metrics (Yang et al., Citation2022; Huang, Zhou, et al., Citation2022), community structure detection (Huang, Chen, et al., Citation2022; Contisciani et al., Citation2020; Sun et al., Citation2020) and diffusion (Wang et al., Citation2021 ). Recent years have seen a growing interest in graph contrastive Learning, such as Zhao et al. (Citation2021) consider that directly applying existing graph self-supervised learning methods to heterogeneous graphs can not fully capture the rich semantics and their correlations in heterogeneous graphs. They investigate self-supervised learning on heterogeneous graphs and propose a novel model named Multi-View Self-supervised heterogeneous graph Embedding (MVSE). By encoding information from different views defined by meta paths and optimising both intra-view and inter-view contrastive learning tasks, MVSE comprehensively utilises unlabelled information and learns node embeddings. Zhu et al. (Citation2022) present a novel heterogeneous graph contrastive Learning method and generate multiple semantic views for HGs based on meta-paths and propose a novel multi-view contrastive aggregation objective to adaptively distil information from each view. Chen et al. (Citation2023) propose heterogeneous graph contrastive learning method which enhances heterogeneous graph contrastive learning with meta networks to allow the personalised knowledge transformer with adaptive contrastive augmentation. It can incorporate heterogeneous relational semantics into the user-item interaction modelling with contrastive learning-enhanced knowledge transfer across different views.

Node importance evaluation is an important branch of HIN analysis. Evaluating critical nodes in HIN can effectively control the process of information propagation, such as accurately placing e-commerce advertisements to promote products, controlling the forwarding of information to suppress the spread of rumours, etc. However, most existing methods model the networks as aggregated structures without considering the inter-layer interactions, which lose semantic information. Multilayer structure pays more attention to heterogeneity in multilayer heterogeneous networks (MLN) (Kivela et al., Citation2014). A multilayer structure not only directly reflects the interactions between different layers but also preserves the more detailed information of the network. Node importance can be evaluated more comprehensively by extracting structural and semantic information.

Node importance evaluation has become a research hotspot in many fields. For example, finding out the most influential users can effectively regulate the spread of public opinions (Liu & He, Citation2020). Traffic congestion can be effectively reduced by finding critical traffic hubs (Liu et al., Citation2022). In the disease transmission network, finding the superinfection source can effectively block the infection source in time and prevent the spread of the disease (Li et al., Citation2021).

In this paper, we propose a node importance evaluation model based on a graph convolution network in MLN. The feature matrix of nodes is constructed based on topological structure attributes and semantic information. The node embedding model is constructed, which includes intra-graph semantic aggregation and inter-graph semantic aggregation. We evaluate the critical node by metrics, which are constructed based on node embeddings and structural features.

The main contributions of this paper include two aspects:

  • An adapted node sampling method based on meta path is designed. Considering the unbalanced distribution number of different type nodes in MLN, we define a formula for calculating the coverage rate of sampling nodes.

  • A node importance evaluation method in a multilayer network based on graph convolution is proposed, which is an effective method of identifying important nodes in MLN. It includes a node embedding model and metrics based on node embeddings and structural features.

The paper’s overall structure is organised as follows. Section 2 reviews some related work. Section 3 illustrates the basic concepts of multilayer networks. Section 4 introduces the method proposed in the paper. Section 5 discusses the results obtained in the experimental part. Section 6 summarises the method and points out the future research direction.

2. Related work

Because of the various types of nodes and connections in HIN (Zhou et al., Citation2022), scholars have done a lot of research on node importance evaluation. The node importance evaluation methods of HIN fall into two categories: meta path structure and multilayer structure.

2.1. Node importance evaluation methods based on the meta path method

The method usually constructs evaluation metrics based on the meta path to get node importance ranking. Yang et al. (Citation2019) proposed a method based on meta path information entropy (MPIE). They transformed HIN into several homogeneous networks. Information entropy was used to describe the complexity and uncertainty of social impact. The ultimate node importance was calculated by integrating nodes’ influence in different meta paths. However, this method needs to traverse all public paths between node pairs when calculating the indirect influence of nodes, leading to high time complexity. Cun and Zhou (Citation2021) proposed the adjacent information entropy model based on meta path and meta graph. This method integrated the structure and semantic information to measure node importance. Liu et al. (Citation2020) considered the semantic information captured by a meta path is not enough to measure the importance of a node in HIN. They proposed a method based on the main meta path and auxiliary meta path. An auxiliary meta path was added to supplement the semantic information of the main meta path. The node correlation ranking was calculated by the main meta path. The node internal ranking was calculated by the auxiliary meta path. Molaei et al. (Citation2020) proposed a method based on information entropy (ERM). ERM defined three different types of probability entropy according to the meta path. The ultimate node importance was calculated by integrating the entropy in different meta paths. ERM focused on the information of first-order neighbours and high-order neighbours of nodes but ignored the node characteristics influence. Yang, Zhou, et al. (Citation2021) The nodes’ direct influence was constructed based on the nodes’ embedding vectors. The indirect influence is constructed based on the number of nodes’ common neighbours. The ultimate node importance was fused according to nodes’ direct influence and indirect influence. Those methods are faced with some problems, such as evaluating node importance based on specific meta paths or ignoring the node characteristics.

2.2. Node importance evaluation methods based on the multilayer structure

Node importance evaluation methods based on the multilayer structure can make full use of the rich node attribution information and comprehensive structure information in the network. Pedroche et al. (Citation2016) defined the PageRank of MLN based on the inter-layer interactions. Shu et al. (Citation2022) proposed a method for multiplex heterogeneous networks based on graph embedding. For one type and different types of edges, the nodes’ features were aggregated after random walk sampling neighbour nodes. The nodes’ features were mapped to the embedding space by multilayer perceptron to obtain the embedding vectors. Then, the node importance evaluation metrics were constructed by the nodes’ embedding vectors and local structure features. Besides, some researchers have focused more on how inter-layer connectivity affects the node importance of MLN. Rahmede et al. (Citation2018) proposed a method based on network structure. They designed equations to calculate layers’ influences and nodes’ centrality. Wu et al. (Citation2019) used a tensor framework to calculate the node’s eigenvector centrality. For a given form of inter-layer influence, they proved the uniqueness of the existence of eigenvector centrality. Luo et al. (Citation2020) defined the local clustering coefficient with the ternary closure obtained by random walks in different layers. Then, they calculated node importance in the directed weighted network and the undirected weighted network with D-S evidence theory. Wan et al. (Citation2022) divided nodes into core and auxiliary layers based on relational characteristics. The auxiliary layer nodes’ importance was quantified by designing the interlayer influence. The different interlayer influence was fused according to different weights. The nodes’ importance was fused according to the node transmission characteristics in the core layer and auxiliary layer. Wang et al. (Citation2023) evaluated nodes’ importance by calculating nodes’ local structure entropy. They considered the nodes’ importance of intra-layer and inter-layer. Most of the above methods build multilayer homogeneous networks and quantify the importance of nodes through inter-layer connections and intra-layer topology information.

The node importance evaluation methods of multilayer homogeneous networks ignore the deep mining of multitype edges and nodes’ information in MLN. In addition, those methods also face the problem of how to quantify the inter-layer influence according to the inter-layer connectivity. Thus, we use a multilayer structure to model MLN, considering the interaction between different node types. Besides, random walk sampling based on meta path is used to construct a semantic graph, which includes the inter-layer and intra-layer interactions. We consider not only the nodes’ local information but also the different semantic information. Node features in the semantic graph are aggregated by using the attention mechanism and average pooling. In addition, we construct the metrics of critical node evaluation based on node embedding vectors and structural features.

3. Relate concepts

This section introduces the basic definitions of HIN. Aiming at the various types of nodes and edges in the original network, the paper uses a multilayer structure to model MLN.

3.1. Basic definition

Definition 3.1:

Heterogeneous network (Luo et al., Citation2014). HIN is a directed graph G=(V,E) with an object-type mapping function φ:VA and a relationship-type mapping function ψ:ER. Each object vV belongs to one particular object type in the object type set A:φ(v)A, and each relationship eE belongs to a particular relationship type in the relationship type set R:ψ(e)R, When the number of relationship types |R| > 1 or the number of object types |A| > 1 in the network, the network is heterogeneous.

Definition 3.2:

Network schema (Sun et al., Citation2011). The network schema is defined as Tg=(A,R), which is a meta-description of the network G=(V,E). Network schema is also a constraint used to define object types and relationship types, which makes HIN has the characteristics of a semi-semantic structure.

Definition 3.3:

Meta path (Sun et al., Citation2011). A meta path P is a set of nodes defined on the network schema Tg=(A,R), which have predecessor and successor connections. Meta path is expressed as A1R2A2R3A3R4A4.RlAl+1. R=R1R2Rl defines a compound relationship between types A1 and Al+1, where the combination operator on the relationship is represented.

Definition 3.4:

Semantic graph (Luo et al., Citation2014). The semantic graph M is a directed acyclic graph with a single source node vi and a single target node vj, which is defined on G with the network schema Tg. The semantic graph is a set of interactive nodes based on a specific type of meta path, which expresses specific semantic information.

3.2. Multilayer network

A multilayer network (Kivela et al., Citation2014) with a multilayer structure is often used to model complex interaction structures between nodes. According to the definition of a multilayer network proposed by Boccaletti et al. (Citation2014), the multilayer network is expressed as M=(S,C), where S={Gα;α{1,,n}} is the set of subgraphs, Gα=(Xα,Eα), Xα is the node set of the subgraph Gα and Eα is the edge set of subgraph Gα where C={EαβXα×Xβ} is the set of edges between different subgraphs Xβ, α,β{1,,n}, αβ and Eαβ are the edges between subgraph Gα and subgraph Gβ. To clearly show the relationship between different types of nodes in the network, it is usually modelled as MLN, as shown in Figure . In Figure , the nodes on the inter-layer share the same type, and nodes on different layers have different types. Eα is an edge on the layer α, and Eαβ is an edge between the layer α and the layer β.

Figure 2. Illustration of MLN.

Figure 2. Illustration of MLN.

4. Construction of the evaluation model

This section will address the construction of an evaluation model for critical nodes of MLN. The node feature matrix F and adjacency matrix Λ are used as the inputs of multilayer graph convolution (MGC). The output of the model is the embeddings of nodes. Finally, the evaluation index of critical nodes is constructed and the importance ranking of nodes is obtained according to the importance scores of nodes.

4.1. Model structure

The overall structure of MLN node importance evaluation (MNIE) is shown in Figure . MNIE consists of three parts: the construction of the feature matrix and adjacency matrix, the multilayer graph convolution model (MGC) and node importance calculation. The inputs of the MGC model are the adjacency matrix ΛRN×N and node feature matrix FRN×d, where N is the number of nodes and d is the dimension of the feature vector. In Figure , ziMk represents the embeddings of nodes vi after intra-graph semantic aggregation. Zi represents the embeddings of the node vi after inter-semantic graph aggregation. TS is the evaluation index of information propagation ability. IN is the evaluation index of influence. GI(i) is the importance score of the node vi.

Figure 3. Structure of the MLN node importance evaluation model.

Figure 3. Structure of the MLN node importance evaluation model.

4.2. Construction of feature matrix and adjacency matrix

4.2.1. Feature matrix

In the paper, the feature matrix F includes two parts: topological features and semantic features. Centrality indexes are employed to construct topological features. The four centrality indexes are K-shell value (KS) (Kitsak et al., Citation2010), local agglomeration coefficient (LC) (Freeman, Citation2002), degree centrality (DC) (Sabidussi, Citation1966) and feature vector centrality (EC) (Bonacich & Lloyd, Citation2001). The semantic features are constructed based on the information entropy, which is based on the first-order neighbour information of the node and the neighbour information on the meta path (Cun & Zhou, Citation2021). Therefore, the feature matrix F can be achieved by formula Equation(1). (1) F=[LC1DC1EC1KS1IE1LC2DC2EC2KS2IE2LCiDCiECiKSiIEi](1)

Information entropy IE is calculated by formula Equation(2) (Cun & Zhou, Citation2021). (2) IE=EL+EI(2) where link entropy EL is calculated by formula Equation(3) (Cun & Zhou, Citation2021). (3) EL=ΘRjNΘ(i)(Pijlog2Pij)(3) where R is the relation set, NΘ(i) is the neighbour node set of the node vi on the relation Θ and Pij is the selection probability of neighbour nodes vj. Pij=Di/DNΘ(j), where Di is the degree of the node vi, NΘ(j) is the neighbour nodes of the node vj on the relation Θ,DNΘ(j) is all degree of NΘ(j). For example, the node vi has a neighbour node vj, node vjhave neighbor nodes {vj1, vj2}. When Dvi=1, Dj1=1, Dj1=2, we can calculate Pij=0.528.

Interactive entropy EI is calculated by formula Equation(4) (Cun & Zhou, Citation2021). (4) EI=pPjNp(i)PijlogPij(4) where P is the meta path set, p is the meta path of P, Np(i) is the neighbour node set of node vi in the meta path p. Pij=Di/DNp(j), where DNp(j) is all degree of the neighbour nodes of the node vj in the meta path p.

To make different features be the same order of magnitude, the features need to be normalised. The feature vali of the node vi is normalised by formula Equation(5). (5) vali=rankN(5) where rank is the rankings of node vi scores according indexes LC, DC, EC, KS and IE , and N is the total number of nodes.

4.2.2. Adjacency matrix

We sample nodes according to different meta paths to construct different semantic graphs. The adjacency matrix is constructed according to the semantic graph.

Considering different types of node number in MLN is different (Chen et al., Citation2018), random sampling will lead to over-sampling nodes with a small number and under-sampling nodes with a large number. Therefore, considering the unbalanced distribution of different types of nodes, we propose an adapted node sampling method based on meta path (AS-BMP) to achieve a semantic graph. AS-BMP is shown in algorithm 1. The semantic graph is shown in Figure .

Figure 4. Construction of the semantic graph.

Figure 4. Construction of the semantic graph.

The calculation of the coverage μ in algorithm 1 is shown in formula Equation(6). (6) μ=B1N+B2N++BiN(6) where N is the total number of nodes in M, Bi is the number of nodes in mk with type i.

4.3. MGC model

Heterogeneous network embedding aims to embed nodes into low-dimensional vector space by using topological structure information and node attribute information of the network while retaining the inherent structural and semantic features of the network (Zhou et al., Citation2022). If only the first-order neighbour information is aggregated in the embeddings of MLN, the information of nodes of all types cannot be captured. For example, in the scientific cooperation network, there is no direct connection between the authors’ nodes, but there is a close cooperative relationship among the authors under the semantics of co-writing papers. Such semantic features cannot be obtained by first-order neighbours (Liang et al., Citation2022). Therefore, we design the MGC model, which aggregates neighbour node information on different semantic graphs to construct different semantic features. Then the semantic features on different semantic graphs are aggregated to obtain the final feature vectors of nodes. The MGC model includes intra-graph semantic aggregation and inter-graph semantic aggregation. The intra-graph semantic aggregation employs the attention mechanism (Zhang et al., Citation2019) to aggregate the features of neighbour nodes of the node vi. The inter-graph semantic aggregation takes average pooling to aggregate the semantic features of neighbour nodes of the node vi, which are on different semantic graphs.

4.3.1. Intra-graph semantic aggregation

The neighbouring nodes of the node vi on the same semantic graph have different importance. Thus, a self-attention mechanism (Zhang et al., Citation2019) is employed to aggregate neighbour features.

In Figure , NiMk={ai1Mk,ai2MkaijMk} represents the neighbour node set of the node vi on the semantic graph Mk. The embedding ziMk of nodes vi is computed by formula Equation(7). (7) zi=jΓMk(vi)ωijMkFj(7) where ΓMk(vi) is the neighbour set of the node vi on Mk, Fj is the feature vector of the neighbour node aijMk. The calculation of neighbour feature weight ωijMk is shown in formula Equation(8). (8) ωijMk=exp{σ(ϵ[FiFj])}λΓMk(vi)exp{σ(ϵ[FiFλ])}(8) where σ is the activation function, is the dot product operation and ϵ is the attention weight matrix. Then, through intra-graph semantic aggregation, the embeddings of the nodes are z={z1Mk,z2Mk,z4Mk,,ziMk}.

Figure 5. Intra-graph semantic aggregation.

Figure 5. Intra-graph semantic aggregation.

4.3.2. Inter-graph semantic aggregation

Considering that nodes have different semantic features on different semantic graphs, the average pooling method, as shown in Figure , is used to aggregate the features of the node vi. The embedding Ziof nodes vi after inter-graph semantic aggregation is calculated by Equation(9). (9) Zi=k|M(i)|ziMk|M(i)|(9) where |M(i)| is the number of semantic graphs containing the node vi. After inter-graph semantic aggregation, the embeddings of nodes are Z={Z1,Z2,,Zi}.

Figure 6. Inter-graph semantic aggregation.

Figure 6. Inter-graph semantic aggregation.

4.4. Model training

To improve the convergence speed of the MGC model, the negative sampling technology is used to optimise the model parameters by comparing positive and negative samples to complete the training of the MGC model.

According to the type of meta path, node sampling is carried out to obtain the node-set Path={pathi|1i|V||}, where pathi is the path generated by sampling started from the node vi. For the node vi, the positive sample vr is obtained by randomly sampling the neighbouring nodes within the third-order neighbours of vi on pathi, and the negative sample vn is obtained by randomly sampling a node on another path. Therefore, the training sample set Spath of the MGC model composed of all triples (vr,vi,vn) is obtained. The loss function of MGC is shown in formula Equation(10). (10) l=(vr,vj,vn)Spath{lnσ[Zvrzvi]+lnσ[ZvnZvj]}(10) where σ is the sigmoid activation function and Z is the embeddings of nodes after inter-graph semantic aggregation.

4.5. Node importance

This section mainly introduces the metric construction of critical node evaluation, including node propagation ability and node influence. The node importance score is obtained by the node propagation ability and node influence.

4.5.1. Node propagation ability

In MLN, the ability of propagation is affected by the local structure of nodes. The K-shell value represents the compactness of the topological structure around the node. The tighter the structure, the higher the node efficiency in spreading information (Kitsak et al., Citation2010). In addition, information entropy IE can describe the complexity and uncertainty of information propagation among nodes in social networks (Cun & Zhou, Citation2021). Thus, we combine the K-shell and information entropy to construct the propagation ability of a node. The propagation ability of any node vi is calculated as shown in formula Equation(11). (11) TS(i)=KsijΓ(i)Ksj×eIEj(11) where Ksi is the k-shell of the node vi, IEj is the information entropy of the node vj, and Γ(i) is the first-order neighbour node set of the node vi on different semantic graphs.

4.5.2. Node influence

In MLN, the more the similar features, the more likely there is information interaction between nodes. The embeddings retain the structural features of the network (Freeman, Citation2002). In addition, the interaction between nodes will decrease when the Euclidean distance between embedding vectors becomes longer (Kitsak et al., Citation2010). Moreover, the more paths through the target node, the greater the influence of the target node (Liu et al., Citation2019). Thus, we take the ratio of the number of paths through the node vi and the total number of paths as the influence coefficient λi of the node vi. The influence IN(i) of the node vi is calculated by formula Equation(12). (12) IN(i)=jΓ^(i)λieΩij(12) where Ωij is the Euclidean distance between embeddings, Γ^(i) is the set of neighbour nodes sharing the same type within the third order on different semantic graphs. λi is computed by formula Equation(13) (Liu et al., Citation2019). (13) λi=pviapnum(13) where pnum is the total path number passing through the node vi, and pvia is the number of paths passing through its neighbour vj.

4.5.3. Node importance calculation

The node importance GI(i) is obtained by the propagation ability and influence of nodes. GI(i) is calculated by formula Equation(14). (14) GI(i)=rTS(i)+(1r)IN(i)(14) where r is the weight of node propagation ability. r is determined by experiment, which can be seen in detail in the next section.

4.5.4. Time complexity analysis

The time complexity of preprocessing the node sampling probability based on the number of nodes and random walk sequences based on the meta path is shown in Formula Equation(15). (15) O(nlhk)(15) where the number of nodes is n, the length of walk sequences is l, the number of walk sequences is h and the types of meta-path is k.

The complexity of MGC training is shown in Formula Equation(16). (16) O(nlhkbe)(16) where the number of negative samples is b and the number of model iterations is e.

The complexity of node propagation ability calculation based on the k-shell and IE value of neighbours is shown in Formula Equation(17). (17) O(na)(17) where the mean number of neighbours is a.

The complexity of node influence calculation based on the third-order neighbour node is shown in Formula Equation(18). (18) O(na3)(18)

The complexity of weighted summation and ranking is shown in Formula Equation(19). (19) O(n)+O(nlogn)(19)

The total time complexity is shown in Formula Equation(20). (20) O(nlhk+nlhkbe+na+na3+n+nlogn)(20)

Since l, k, h, b, e and a are usually small constants, the time complexity can be simplified, as shown in Formula Equation(21). (21) O(n+nlogn)(21)

5. Experiment

To verify the effectiveness of the evaluation model, experiments are conducted on two real MLN datasets. The comparison method includes the K-shell algorithm (Kitsak et al., Citation2010), the Page Ranking algorithm in MLN (PR) (Tu et al., Citation2018), the entropy ranking algorithm (ERM) (Molaei et al., Citation2020), the K-shell-based gravity model ranking algorithm (KSDG) (Yang & Xiao, Citation2021) and influence maximisation based on network embedding (IMNE) (Yang, Zhou, et al., Citation2021). The basic information of each experimental dataset is shown in Table , in which the meta path type is referenced from the literature (Yang, Guan, et al., Citation2021).

  1. DBLP (Yang, Guan, et al., Citation2021). It is a paper network in the computer field which contains four types of nodes: Author (A), Paper (P), Conferences (C) and Term (T). The types of edges include the following: the author writes the paper or the paper is written by the authors (PA); the paper is published in a conference or included in a conference (PC); the paper contains terms or terms are included in the paper (PT).

  2. ACM (Yang, Guan, et al., Citation2021). The dataset from ACM Digital Library in 2010 includes data from 14 representative computer science conferences. Data contain three types of nodes: Papers (P), Authors (A) and Conferences (C). The types of edges include the following : the authors write the paper or the paper is written by the authors (PA); the paper is published in a conference or included in a conference (PC).

Table 1. Datasets.

5.1. Evaluation method

The method based on the propagation model and the method based on node deletion are employed to evaluate the proposed method MNIE.

5.1.1. Method based on propagation

The evaluation method based on the propagation model mainly considers the ability of infecting other nodes. In the paper, the susceptible influenced model (SIR) (Avram et al., Citation2022) is used to evaluate the propagation information ability of nodes from the view of dynamics. The nodes in the SIR model fall into three states: susceptible state, infectious state and immune state. In the SIR model, the settings of the infection rate δ and recovery rate γ depend on the topology of the network. Different semantic graphs of MLN have different topological structures, so δ and γ should be set for different semantic graphs. In the experiment, δ is set as the threshold of infectious disease progressive propagation of each semantic graph. γ is the reciprocal of the average degree on each semantic graph. The Top-100 nodes calculated by different methods are used as infection sources. After t times of infection, the propagation coverage f(t) is the ratio of the infected nodes’ number in all semantic graphs and the total node numbers in all semantic graphs. In the experiment, we take the average of 10 independent experiments as the experimental results. The calculation of f(t) is shown in Formula Equation(22). (22) f(t)=l=1kinf_nodes(Ml)all_Num(M)(22) where inf_nodes(Ml) is the total number of infected nodes on the semantic graph Ml after the t iteration, all_Num(M) is the total number of nodes on all semantic graphs and k is the number of semantic graphs.

5.1.2. Method based on deletion

Connected node pair is an index of network robustness (Fan et al., Citation2020), which describes the connectivity between nodes in the network. By observing the changes in the number of nodes in the maximum connected graph, we can judge the influence of nodes on the network topology. After the nodes are removed, the greater change of the number of nodes in the maximum connected graph, the greater the structure change of the overall network, meaning that the nodes are more important. The network connectivity is computed by formula Equation(23): (23) Con=l=1GMlkMax(Num(G))all_Num(M)(23) where G is a connected subgraph on the semantic graph Ml, Max(Num(G)) is the nodes number in the maximum connected subgraph, all_Num(M) is the total node number of all semantic graphs.

5.2. Experimental results and analysis

5.2.1. Determination of r

Different r has different effects on the ranking quality of nodes’ importance. The top 200 nodes obtained by the MNIE method are divided into 10 groups, let's say gp1 to gp10, which are used as infection sources in SIR and SI models. The nodes of gpi will be more important than the nodes of gpi+1, so that the infection range of gpi is larger than that of the node of gpi+1. For the SI model, when t=T2, the propagation coverage f(t) of each method has obvious discrimination. Therefore, the value of f(t) is taken as MAX-SI. T is the iteration step when the number of infected nodes is stable. For the SIR model, the maximum value of propagation coverage f(t) is taken as MAX-SIR.

In the experiment, we selected the r value, which can make MAX-SI and MAX-SIR approximately descent from gp1 to gp10, so that the ranking quality of nodes’ importance is better. Considering that the propagation of infection is accidental, the average value of 10 independent experiments is taken.

As can be seen from Figure , there are 11 different colour curves with a range of r from 0 to 1 and an interval of 0.1, which describes MAX-SI and MAX-SIR values when r takes different values. Specifically, as shown in Figure (a), when the r is 0.6 in the ACM dataset, the MAX-SI approximately descent from gp1 to gp10. It is worth noting that the change curve of the MAX-SI value of r = 0.7 also has a similar effect as r = 0.6. However, it can be seen in Figure (b) that the MAX-SIR is not an obvious downward trend when r = 0.7. Thus, on the ACM dataset, it indicates when r=0.6 can obtain better node ranking quality. As shown in Figure (c,d), when the r value is 0.3 in the DBLP dataset, the MAX-SI and MAX-SIR approximately descend from gp1 to gp10. It indicates that r=0.3 can obtain better node ranking quality.

Figure 7. Comparison diagram of max-SIR and max-SI. (a) The impact of r value on max-SI in ACM, (b) The impact of r value on max-SIR in ACM, (c) The impact of r value on max-SI in DBLP, (d) The impact of r value on max-SIR in DBLP.

Figure 7. Comparison diagram of max-SIR and max-SI. (a) The impact of r value on max-SI in ACM, (b) The impact of r value on max-SIR in ACM, (c) The impact of r value on max-SI in DBLP, (d) The impact of r value on max-SIR in DBLP.

In summary, on the ACM dataset, the r is 0.6. Similarly, on the DBLP dataset, the r is 0.3.

5.2.2. Accuracy and ranking quality

We use the SI model and SIR model to evaluate the accuracy and ranking quality of critical nodes obtained by each method. We select the top 100 nodes obtained by different methods (K-shell, PR, ERM, KSDG and MNIE) as the infection source of the SI model. The experimental results of propagation coverage f(t) are shown in Figure .

Figure 8. Comparison diagram of SI. (a) ACM dataset. (b) DBLP dataset.

Figure 8. Comparison diagram of SI. (a) ACM dataset. (b) DBLP dataset.

As shown in Figure , compared with other methods, f(t) of MNIE is higher when t between 3 and 20. On the other hand, f(t) of MNIE reaches a steady state when the iteration step is 20, while steady state other methods are later. It confirms the high efficiency of MNIE.

In addition, these methods have little difference in infected nodes at the beginning of the iteration, but only an obvious difference at the late iteration. Because at the beginning of the iteration, there are few infected nodes, the infection range is not much different. With the increase in iteration steps, the number of infected nodes increases exponentially, showing obvious differences. Moreover, the maximum infection range of each method is not much different after iterative stability. The reason is that the network has strong connectivity and the infection source can infect most nodes after enough iteration.

As shown in Figure , compared with ERM, MNIE is leading in infection coverage on two datasets in the middle of an iteration. The main reason is that ERM is a single evaluation metric of information entropy based on the meta path, which will lead to excessive dependence on the type of the meta path. While MNIE not only mines topological structure and semantic information based on the meta path but also considers the influence of node features on node importance. The infection coverages of K-shell on DBLP and ACM datasets are the worst. The reason is that the K-shell cannot further distinguish the nodes’ importance on the same shell and ignores the semantic information of nodes in heterogeneous networks.

As shown in Figure , the top 100 nodes of methods (K-shell, PR, ERM, IMNE, KSDG and MNIE) are divided into 10 groups, such as GP={gp1,gp2,..,gp10}, which are used as infection sources to spread infection in the SIR model. Considering that the spread of infection is accidental, the average value is taken after 10 independent experiments.

Figure 9. Comparison diagram of MAX-SIR. (a) ACM dataset. (b) DBLP dataset.

Figure 9. Comparison diagram of MAX-SIR. (a) ACM dataset. (b) DBLP dataset.

Intuitively, if MAXSIR  approximately descend from gp1 to gp10, the ranking quality of nodes’ importance is better. On the ACM dataset, the MAX-SIR values of gp1,gp2,gp3,gp4 obtained by MNIE were higher than those of other methods. It indicates that the critical nodes of gp1,gp2,gp3,gp4 evaluated by MNIE have better ranking quality. On the DBLP dataset, the critical nodes of gp1,gp2,gp3,gp4,gp5,gp6 evaluated by MNIE have better ranking quality. The reason is that MNIE pays attention to not only node features on the nodes’ importance but also local structure information.

In the ACM dataset, the experimental results of MNIE show that except for the gp4 group, nodes have smaller MAX-SIR values than gp5 group nodes. The reason is that there are nodes, which should have been sorted in gpi, sorted in gpik. There are many such situations in the nodes obtained by other methods.

In summary, the critical nodes evaluated by MNIE have better accuracy and ranking quality.

5.2.3. Network connectivity impact analysis

To further verify the accuracy of the critical nodes obtained by each method, we compute the network connectivity after deleting critical nodes in sequence.

As can be seen from Figure , Con of the top 100 nodes obtained by deleting MNIE on two datasets drops faster. It means that MNIE has advantages over other methods. When the nodes obtained by the K-shell algorithm are deleted, the decrease in network connectivity is less obvious than in K-shell, PR, ERM, IMNE and KSDG. Because the K-shell algorithm ignores the influence of semantic information on HIN connectivity. Besides, when the nodes obtained by IMNE are deleted, the decrease in network connectivity is less than MNIE. Although the IMNE method pays attention to node features, it ignores local structure information on the nodes’ importance. When the nodes obtained by ERM are deleted, the decrease in network connectivity is less than MNIE. Although the ERM method pays attention to the information of first-order neighbours and high-order neighbours, it ignores node features on the nodes’ importance.

Figure 10. Comparison chart of network connectivity. (a) ACM dataset. (b) DBLP dataset.

Figure 10. Comparison chart of network connectivity. (a) ACM dataset. (b) DBLP dataset.

6. Conclusion

Aiming at the existence of various types of nodes and edges in MLN, we proposed a critical node evaluation method for MLN. We used the meta path to mine semantic information and structural information between nodes, to effectively evaluate critical nodes in MLN. Experimental results show that compared with K-shell, PR, ERM and KSDG methods, our method can more accurately evaluate the critical nodes in MLN. In the paper, the inter-graph semantic aggregation needs to be further optimised, which ignores the differences in the importance of node features on different semantic graphs. In addition, because the paper constructs the heterogeneous network as a static heterogeneous network, it ignores the dynamic nature of the network. In future, the dynamic impact of MLN on critical node evaluation will be further considered to improve the evaluation accuracy of critical nodes in MLN.

Acknowledgements

The authors would like to thank the anonymous reviewers for their constructive and insightful comments on the paper.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the National Natural Science Foundation of China [grant number 62062050], the Natural Science Foundation of Jiangxi Province [grant number 20202B ABL2020 39] and the Innovation Foundation for Postgraduate Student of Jiangxi Province [grant number YC2021-S709].

References

  • Avram, F., Freddi, L., & Goreac, D. (2022). Optimal control of a SIR epidemic with ICU constraints and target objectives. Applied Mathematics and Computation, 418, Article 126816. https://doi.org/10.1016/j.amc.2021.126816
  • Boccaletti, S., Bianconi, G., Criado, R., Del Genio, C. I., Gómez-Gardenes, J., Romance, M., & Zanin, M. (2014). The structure and dynamics of multilayer networks. Physics Reports, 544(1), 1–122. https://doi.org/10.1016/j.physrep.2014.07.001
  • Bonacich, P., & Lloyd, P. (2001). Eigenvector-like measures of centrality for asymmetric relations. Social Networks, 23(3), 191–201. https://doi.org/10.1016/S0378-8733(01)00038-7
  • Chen, H., Yin, H., Wang, W., Wang, H., Nguyen, Q. V. H., & Li, X. (2018). PME: Projected metric embedding on heterogeneous networks for link prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1177–1186).
  • Chen, M., Huang, C., Xia, L., Li, X., & Zhang, M. (2023). Heterogeneous graph contrastive learning for recommendation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining (pp. 544–552).
  • Contisciani, M., Power, E. A., & De Bacco, C. (2020). Community detection with node attributes in multilayer networks. Scientific Reports, 10(1), 1–16. https://doi.org/10.1038/s41598-020-72626-y
  • Cun, X., & Zhou, L. (2021). Influence maximization based on adjacency entropy in heterogeneous information networks. Application Research of Computers, 38(11), 3304–3309. https://doi.org/10.19734/j.issn.1001-3695.2021.04.0138
  • Fan, C., Zeng, L., Sun, Y., and Liu, Y. (2020). Finding key players in complex networks through deep reinforcement learning. Nature Machine Intelligence, 2(6), 317–324. https://doi.org/10.1038/s42256-020-0177-2
  • Freeman, L. C. (2002). Centrality in social networks: Conceptual clarification. Social Networks, 24(3), 238–263. https://doi.org/10.1016/S0378-8733(02)00018-4
  • Huang, R., Chen, Z., He, J., & Chu, X. (2022). Dynamic heterogeneous user generated contents-driven relation assessment via graph representation learning. Sensors, 22(4), 1402. https://doi.org/10.3390/s22041402
  • Huang, T., Zhou, C., Zhang, R. X., Wang, Z., Li, K., & Qiu, M. (2022). Learning tailored adaptive bitrate algorithms to heterogeneous network conditions: A domain-specific priors and meta-reinforcement learning approach. IEEE Journal on Selected Areas in Communications, 40(8), 2485–2503. https://doi.org/10.1109/JSAC.2022.3180804
  • Kitsak, M., Gallos, L. K., Havlin, S., Liljeros, F., Muchnik, L., Stanley, H. E., & Makse, H. A. (2010). Identification of influential spreaders in complex networks. Nature Physics, 6(11), 888–893. https://doi.org/10.1038/nphys1746
  • Kivela, M., Arenas, A., Barthelemy, M., Gleeson, J. P., Moreno, Y., & Porter, M. A. (2014). Multilayer networks. Journal of Complex Networks, 2(3), 203–271. https://doi.org/10.1093/comnet/cnu016
  • Li, C., Zhang, Y., & Li, X. (2021). Epidemic threshold in temporal multiplex networks with individual layer preference. IEEE Transactions on Network Science and Engineering, 8(1), 814–824. https://doi.org/10.1109/TNSE.2021.3055352
  • Liang, X., Ma, Y., Cheng, G., Fan, C., Yang, Y., & Liu, Z. (2022). Meta-path-based heterogeneous graph neural networks in academic network. International Journal of Machine Learning and Cybernetics, 13(6), 1553–1569. https://doi.org/10.1007/s13042-021-01465-8
  • Liu, C., Yin, H., Sun, Y., Wang, L., & Guo, X. (2022). A grade identification method of critical node in urban road network based on multi-attribute evaluation correction. Applied Sciences, 12(2), 813. https://doi.org/10.3390/app12020813
  • Liu, L., Hu, F., Niu, L., & Peng, T. (2019). Node influence based similarity measure method in heterogeneous network. Acta Electronica Sinica, 47(9), 1929–1936. https://doi.org/10.3969/j.issn.0372-2112.2019.09.023
  • Liu, X. Y., & He, D. B. (2020). Research on of competitive nonlinear dynamic information diffusion modeling in online social network. Chinese Journal of Computers, 43(10), 1842–1861. https://doi.org/10.11897/SP.J.1016.2020.01842
  • Liu, Z. J., Liang, Y., & Zhang, W. (2020). Heterogeneous network combined meta-path node importance analysis method. Journal of Chinese Computer Systems, 41(6), 1188–1194. https://doi.org/10.11896/j.issn.1000-5137.20200265
  • Luo, C., Guan, R., Wang, Z., Li, G., & Zhang, S. (2014). Hetpathmine: A novel transductive classification algorithm on heterogeneous information networks. In Proceedings of the 36th European Conference on IR Research (pp. 210–221).
  • Luo, H., Yan, G., Meng, Z., Junbo, B., Juncheng, L., & Ting, L. (2020). Identifying important nodes in multi-relational networks based on evidence theory. Chinese Journal of Computers, 43(12), 2398–2413. https://doi.org/10.11897/SP.J.1016.2020.02398
  • Molaei, S., Farahbakhsh, R., Salehi, M., & Crespi, N. (2020). Identifying influential nodes in heterogeneous networks. Expert Systems with Applications, 160, Article 113580. https://doi.org/10.1016/j.eswa.2020.113580
  • Pedroche, F., Romance, M., & Criado, R. (2016). A biplex approach to PageRank centrality: From classic to multiplex networks. Chaos: An Interdisciplinary Journal of Nonlinear Science, 26(6), 065301. https://doi.org/10.1063/1.4952955
  • Rahmede, C., Iacovacci, J., Arenas, A., & Bianconi, G. (2018). Centralities of nodes and influences of layers in large multiplex networks. Journal of Complex Networks, 6(5), 733–752. https://doi.org/10.1093/comnet/cnx050
  • Sabidussi, G. (1966). The centrality index of a graph. Psychometrika, 31(4), 581–603. https://doi.org/10.1007/BF02289527
  • Shi, C., Li, Y., Zhang, J., Sun, Y., & Yu, P. S. (2021). A survey of heterogeneous information networks analysis and applications. Chinese Journal of Computers, 33(2), 598–621. https://doi.org/10.13328/j.cnki.jos.000000
  • Shu, J., Yao, X., & Li, R. R. (2022). Node importance evaluation in multiplex heterogeneous networks based on graph embedding. Journal of Beijing University of Posts and Telecommunications, 45(4), 104–109. https://doi.org/10.13190/j.jbupt.2021-214
  • Sun, H., He, F., Huang, J., Sun, Y., Li, Y., Wang, C., He, L., Sun, Z., & Jia, X. (2020). Network embedding for community detection in attributed networks. ACM Transactions on Knowledge Discovery from Data, 14(3), 1–25. https://doi.org/10.1145/3385415
  • Sun, Y., Han, J., Yan, X., Yu, P. S., & Wu, T. (2011). Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment, 4(11), 992–1003. https://doi.org/10.14778/3402707.3402736
  • Tu, X., Jiang, G.-P., Song, Y., & Zhang, X. (2018). Novel multiplex PageRank in multilayer networks. IEEE Access, 6, 12530–12538. https://doi.org/10.1109/ACCESS.2018.2807778
  • Wan, L., Zhang, M., Li, X., Sun, L., Wang, X., & Liu, K. (2022). Identification of important nodes in multilayer heterogeneous networks incorporating multirelational information. IEEE Transactions on Computational Social Systems, 9(6), 1715–1724. https://doi.org/10.1109/TCSS.2022.3161305
  • Wang, D., Tian, F., & Wei, D. (2023). A new centrality ranking method for multilayer networks. Journal of Computational Science, 66, Article 101924. https://doi.org/10.1016/j.jocs.2022.101924
  • Wang, Z., Xia, C., Chen, Z., & Chen, G. (2021). Epidemic propagation with positive and negative preventive information in multiplex networks. IEEE Transactions on Cybernetics, 51(3), 1454–1462. https://doi.org/10.1109/TCYB.2019.2960605
  • Wu, M., He, S., Zhang, Y., Chen, J., Sun, Y., Liu, Y.-Y., Zhang, J., & Poor, H. V. (2019). A tensor-based framework for studying eigenvector multicentrality in multilayer networks. Proceedings of the National Academy of Sciences, 116(31), 15407–15413. https://doi.org/10.1073/pnas.1801378116
  • Yang, C., Xiao, Y., Zhang, Y., Sun, Y., & Han, J. (2022). Heterogeneous network representation learning: A unified framework with survey and benchmark. IEEE Transactions on Knowledge and Data Engineering, 34(10), 4854–4873. https://doi.org/10.1109/TKDE.2020.3045924
  • Yang, X., & Xiao, F. (2021). An improved gravity model to identify influential nodes in complex networks based on k-shell method. Knowledge-Based Systems, 227, Article 107198. https://doi.org/10.1016/j.knosys.2021.107198
  • Yang, Y., Guan, Z., Li, J., Zhao, W., Cui, J., & Wang, Q. (2021). Interpretable and efficient heterogeneous graph convolutional network. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2021.3101356
  • Yang, Y., Zhou, L., Du, G., Zou, X., & Ding, H. (2021). Influence maximization based on network embedding in heterogeneous information networks. Transactions on Intelligent Systems, 16(4), 757–765. https://doi.org/10.11992/tis.202009047
  • Yang, Y., Zhou, L., Jin, Z., & Yang, J. (2019). Meta path-based information entropy for modeling social influence in heterogeneous information networks. In 2019 20th IEEE International Conference on Mobile Data Management (MDM) (pp. 557–562).
  • Zhang, C., Song, D., Huang, C., Swami, A., & Chawla, N. V. (2019). Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 793–803).
  • Zhao, J., Wen, Q., Sun, S., Wang, Z., Zhang, Y., & Liu, Z. (2021). Multi-view self-supervised heterogeneous graph embedding. In Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD (pp. 319–334).
  • Zhou, L., Wang, J., Wang, L., Chen, H., & Kong, B. (2022). Heterogeneous information network representation learning: A survey. Chinese Journal of Computers, 45(1), 160–189. https://doi.org/10.1189/7SP.J.1016.2022.00160
  • Zhu, Y., Xu, Y., Cui, H., Wang, S., Zhang, W., & Tang, J. (2022). Structure-enhanced heterogeneous graph contrastive learning. In Proceedings of the 2022 SIAM International Conference on Data Mining (pp. 82–90).