746
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A Blockchain-based federated learning framework for secure aggregation and fair incentives

&
Article: 2316018 | Received 03 Jun 2023, Accepted 02 Feb 2024, Published online: 21 Feb 2024

Abstract

Federated Learning (FL) has gained prominence as a machine learning framework incorporating privacy-preserving mechanisms. However, challenges such as poisoning attacks and free rider attacks underscore the need for advanced security measures. Therefore, this paper proposes a novel framework that integrates federated learning with blockchain technology to facilitate secure model aggregation and fair incentives in untrustworthy environments. The framework designs a reputation evaluation method using quality as an indicator, and a consensus method based on reputation feedback. The trustworthiness of nodes is dynamically assessed to achieve an efficient and trustworthy model aggregation process while avoiding reputation monopolisation. Furthermore, the paper defines a tailored contribution calculation process for nodes in different roles in an untrusted environment. A reward and punishment scheme based on the joint constraints of contribution and reputation is proposed to attract highly qualified workers to actively participate in federated learning tasks. Theoretical analysis and simulation experiments demonstrate the framework's ability to maintain efficient and secure aggregation under a certain degree of attack while achieving fair incentives for each role node with significantly reduced consensus consumption.

1. Introduction

Machine learning is widely used in areas such as image identification, natural language processing, and smart healthcare. However, because it relies so heavily on gathering enormous volumes of raw data from individuals or organisations, user privacy may be compromised. FL, an efficient solution to the privacy protection issue in machine learning, was suggested by McMahan et al. (Citation2017). Workers can locally train models using the distributed machine learning architecture known as FL. The updated model parameters are sent to a central server, rather than the initial data, which aggregates the collected local models to generate a global model without sacrificing user privacy.

1.1. Challenges in federated learning

Although FL facilitates the protection of user privacy, it also encounters significant challenges in actual deployment.

Poisoning attacks and free-rider attacks: Federated learning is vulnerable to poisoning attacks and free-riding attacks (Zhu et al., Citation2023; Xu & Li, Citation2023). Malicious nodes engage in the training task by altering local data or local model parameters, which reduces the efficacy of federated learning. Alternatively, nodes receive global models or even rewards without any real contribution during the training process. To ensure effective and trustworthy model aggregation and fair incentives, appropriate evaluation mechanisms are required for assessing the model quality and reputation of nodes (Wang et al., Citation2022; Zhang et al., Citation2023).

Single point of failure: Federated learning is susceptible to a single point of failure (K. Zhang et al., Citation2022; Zhu et al., Citation2023) because it depends on a central server to transmit model parameters. Thus, federated learning frameworks combined with blockchain technology make them more geared towards local decentralisation and are an effective solution (Jiang et al., Citation2023; Kim et al., Citation2020; Qu et al., Citation2022). A peer-to-peer network collaboratively maintains a distributed ledger, which is essentially what blockchain is (Hasan et al., Citation2023; Huang et al., Citation2022). It can take the place of the central server to sustain the FL process because of its decentralised, tamper-proof, and traceable properties (Liu et al., Citation2023; Madill et al., Citation2022). Therefore, storing local/global models (or data) in the blockchain can help improve the overall integrity of the FL system. Blockchain nodes collaborate to maintain the stored data. Together they can audit the stored models/data and detect malicious intent from any node. But it also has issues with the computational and consensus inefficiencies of nodes in real-world applications.

1.2. Contribution

The challenges mentioned above may lead to negative impacts on the FL process or even interrupt the federal learning task. Therefore, to cope with these problems, this paper proposes a blockchain-based federated learning framework. The framework achieves secure aggregation and fair incentives in an environment of distrust by defining these three components: (1) Quality-based reputation assessment (2) Reputation-based consensus method (3) Contribution and reputation-based incentives.

Specifically, reputation assessment is a process of measuring the trustworthiness of a node and generating a reputation value for the node. The weighted average index number method is used to synthesise model quality and historical factors for reputation assessment of nodes. Then a reputation feedback-based consensus method strategically selects nodes with high reputation for model validation and internal consensus to achieve decentralisation and reliable model aggregation with higher efficiency. In addition, a corresponding contribution calculation algorithm is introduced for the role of each node. An incentive method subject to both contribution and reputation constraints is proposed, aiming at attracting highly qualified workers to engage in collaborative learning tasks.

The contributions of this paper are summarised as follows:

  • A reputation method based on model quality is designed. The method uses a weighted average index number manner that combines the quality scores of nodes and historical factors. It reduces the likelihood of the committee being contaminated by malicious nodes. It furnishes an unbiased and constantly updated evaluation of the nodes' reliability, and at the same time, it prevents the issue of reputation monopolisation.

  • A reputation-based consensus method is designed. The method performs role switching of nodes based on reputation values. By only reaching consensus among highly reputable nodes, it reduces communication consumption and ensures the security of the consensus process.

  • A non-cooperative game form is used to evaluate the contributions of different roles. An incentive mechanism based on contribution and reputation is proposed. Nodes performing various roles can receive fair rewards or penalties within a distrustful setting, while also resisting malicious or inactive voting and discouraging high-reputation nodes from engaging in malicious activities.

The rest of the paper is organised as follows. Section 2 describes the related work. Section 3 gives the specific method of this framework. The experimental results of the method are given in Section 4. A summary and directions for further research are provided in Section 5.

2. Related work

As privacy protection issues have gained importance in recent years, FL has been extensively used in edge computing, smart healthcare, and IoT (Alsamhi et al., Citation2022, Citation2023; Ferrag & Shu, Citation2021; Myrzashova et al., Citation2023). These studies rely on the honesty of the FL workers and the fairness and reliability of the system. However, in real-world applications, there are attackers or unreliable workers who participate in training activities and use altered, substandard data to participate in FL tasks (Zhang et al., Citation2022).

Client selection in federated learning: To resist poisoning, the FL task should select reliable clients for aggregation. Ben Saad et al. (Citation2023) used a deep reinforcement algorithm to dynamically select a network slice as a trusted participant. The selected participant will be in charge of identifying poisoning model updates by leveraging unsupervised machine learning. Song et al. (Citation2022) proposed a reputation model based on beta distribution function to measure the credibility of local users and designed a reputation-based scheduling policy. These work on the premise that workers actively participate in the FL task. The most effective way to attract more quality workers to join is to give them incentives.

Guo et al. (Citation2023) leveraged modified NSGA-II to find the Nash equilibrium, and proposed an Incentive Mechanism Design method for Federated Learning based on Stackelberg Game. Yu et al. (Citation2020) proposed the FL Incentive (FLI) approach, which divides a given budget among workers to maximise the collective utility and minimise inequality. However, previous studies have mainly focused on the reward-sharing scheme of the system and have not considered the potential risks from attackers. At the same time, centralised servers in this untrusted environment pose many pitfalls (Singh et al., Citation2022).

Federated learning combined with blockchain: Considering that traditional FL frameworks are subject to a single point of failure, in addition to keeping workers motivated, FL requires the use of distributed and more secure methods for coordinating collaboration among workers. Blockchain has shown its potential for managing FL. There is a lot of existing work (Peng et al., Citation2022; Qin et al., Citation2022; Shayan et al., Citation2021; Zhang & Zhu, Citation2023) that integrates FL and blockchain to achieve high accuracy, efficiency, and security in the FL process.

Xu and Chen (Citation2022) proposed a new hierarchical IoT architecture μDFL for decentralized FL (DFL) using a hybrid proof-of-credit (PoC) block generation protocol and a voting-based chain termination (VCF) consensus protocol to ensure efficiency and protect privacy at the network edge. Li et al. (Citation2021) proposed a Blockchain-based FL Framework (BFLC) incorporating committee consensus. However, it is susceptible to mixing malicious nodes into the committee, which tends to bias the system. Kang et al. (Citation2019) evaluated the trustworthiness of employees based on a multi-weighted subjective logic model. Opinions are maintained by constructing a reputation blockchain to select high-quality nodes for training. A quality-based consensus process (PoQ) based on Kang et al. (Citation2020) was proposed by Qin et al. (Citation2022). Using the same data, PoQ utilises a committee to assess the quality of fresh blocks. However, all of them suffer from reputation monopoly, subjectivity in reputation assessment, and lack of system incentives for various types of nodes in the presence of malicious nodes.

In addition to the limitations of the method itself mentioned above, part of the previous work focused only on the safety traceability of the training process, and the other part focused only on the fairness of incentives relying on the premise of honest workers. There is a lack of a holistic framework to achieve safe training and fair incentives. Therefore, this paper designs a blockchain-based federated learning framework for secure aggregation and incentive fairness. Combining a quality-based reputation mechanism with a consensus protocol, an incentive allocation mechanism that considers both reputation and contribution is proposed. The framework enhances the motivation of trustworthy workers while filtering malicious updates and facilitating FL. Table summarises the existing work. Where B stands for robustness, F stands for fairness and C denotes the consensus method used.

Table 1. Summary of current work.

3. System overview

3.1. Background

In a typical FL scenario, once a new task is published, the FL model aggregation server initialises a global model which announces the start of FL. Workers obtain the latest global model from the server and update it with their data to generate corresponding local models. The model aggregation server collects these local models and aggregates them into the updated global model (Qi et al., Citation2022). Such a process is considered as a single round. Generally, the FL will be carried out with multiple rounds. The symbols and corresponding definitions mainly utilised in the following discussion are summarised in Table .

Table 2. Symbols and corresponding definitions mainly utilised.

Suppose there are T workers. Each worker i has a data set Di. which contains ni training samples. Thus, there are totally i=1Tni training samples. We use (x,y) to represent the corresponding feature and label for a specific training sample. The objective function of FL is as follows: (1) minlimθL(F(θ),D)=i=1Tnij=1TnjLi(F(θ),Di)(1) where D=Di, i[T]θ is the parameter of model F. L indicates the global loss and Li indicates the local loss of worker i. Li(F(θ),Di)=1ni(x,y)Dil(F(θ,x),y), where l is a loss function that measures the distance between the prediction label F(θ,x) and the ground truth labe y.

In the training process of FL, at the beginning of local training iteration, each worker sets its local parameter as the global parameter θi=θ, and then computes its local gradients Gi with k local training epochs. After that, workers upload their local gradients to the server. The server aggregates local gradients to get the global gradient as follows: (2) G~=i=1T(nij=1TnjGi)(2) where nij=1Tnj is the weight for worker i, and Gi=Li(θ,Di)θ is the local gradient of worker i. In round t, the parameters of global model F is updated with the global gradient as follows: (3) θt+1=θtηG~(3) where η is the learning rate, θt is the parameter in communication iteration t and Gt~ is the global gradient. In the next iteration, workers train the model from the updated global model parameter. The above steps iterate till the model converges. It is important to note that the raw data never leaves the worker during the whole training process. Thus, data privacy is well preserved (Gao et al., Citation2022).

3.2. Proposed framework

The architecture of the framework is shown in Figure . Two types of entities are included in the scenario. Training node: Such node participates in the training by uploading model parameters rather than the original data and trains local models using local data. The training nodes are denoted as P. Committee node: During every training round, the nodes with high reputation are chosen to serve as the committee nodes for the next round. Such node uses local data for model evaluation but is not involved with training. The leader node is the one with the greatest quality score. The committee nodes are denoted as M. The framework can be logically divided into three layers.

  • Blockchain layer: This layer is the network foundation of this framework. The blockchain is a bridge for sharing data such as global models. The training nodes do not communicate directly, but obtain information by synchronising the blockchain. For example, get the latest global model, reputation, etc. from the blockchain ledger. The committee node collects model updates from the training nodes bound to it via peer-to-peer transmission. After aggregating the new global model, the committee node publishes information such as the model and reputation assessment to the blockchain.

  • Reputation & Incentive Layer: This layer is used to represent the reputation and incentive assessment of the nodes. First, the training nodes share their local models to the committee nodes through the blockchain layer. Each committee node performs quality testing of the local model. The committee nodes collaborate on the evaluation of reputation and incentives by exchanging the test results.

  • Training layer: This layer represents the training part of the FL task. The training layer is based on the blockchain layer and the reputation & incentive layer (node roles are divided according to reputation) to implement the decentralised model aggregation task. In this layer, after obtaining the global model, the training nodes update the model with their local data to obtain a new local model. Then, these local models are collected by the committee nodes through the blockchain network, packaged into blocks along with the global model, etc., and shared on the blockchain layer.

Figure 1. The framework model.

Figure 1. The framework model.

The entire training process is succinctly explained as follows:

  • 1. Publish The Task: The initial model is published to the blockchain by the task publisher for federated learning.

  • 2. Train Local Model: The nodes involved in training download the model and train the local model using local data.

  • 3. Upload Local Model: The training nodes upload the parameters to the phase-bound committee nodes in the form of transactions.

  • 4. Evaluate Outcome: The committee nodes share the local updates they have gathered and use the local dataset as the validation set to assess the model. Based on the evaluation results, the leader calculates reputation, contribution, and incentives.

  • 5. Generates A New Block: After the leader aggregates the global model on the chain, it is packaged into the block with updated reputation, incentives, etc. After the committee has reached a consensus, the blocks are added to the blockchain.

Until the model converges to a preset value or reaches a predetermined number of rounds, steps 2 through 5 will be repeated. In the section that follows, the primary methods of this framework are thoroughly explained.

3.3. Quality-based reputation assessment

This section assesses the reputation of nodes based on model quality to effectively deal with model poisoning and free-riding attacks during federated learning.

3.3.1. Quality score

The committee nodes are in charge of sharing and assessing the local updates that have been collected. The evaluation score of committee member m on training node p in the tth round is expressed as: (4) Spm=Lm(Ftm(1))Lm(Ftp(k))(4) Assume that there are k epochs in each training round, where L(·) is the loss function. Ftm(1) is the local learning model of the committee node m that performs only one epoch in the tth training round. Ftp(k) is the local model of the training node p in this round. Lm(Ftp(k)) denotes the error generated by the committee node m using the local dataset for evaluation of Ftp(k).

The most straightforward way to validate a model is to measure the difference between the accuracy of the training node's current round of uploaded model and the accuracy of the previous round of the global model. There is a rapid decline in accuracy, even for valid updates, as the training node updates its local model from the global model Gt1~ to the Ftp(1) state (Chen et al., Citation2021). Similar to how there is a significant accuracy discrepancy between Gt1~ and Ftp(k), even for valid updates. As a result, the committee node is configured to first carry out a legal training Ftm(1) locally for one epoch as a stand-in evaluation of Ftp(1). By calculating the difference between the accuracy of Ftm(1) and Ftp(k), the committee node scores each update.

The leader aggregates the scores of each update and uses the median as the final evaluation score of each update, denoted as Smid . The quality score of training node in the tth round is expressed as: (5) qt={σ1(SmidSh),Smid>Shσ2(SmidSh),SmidSh(5) Where Sh is the threshold value used to distinguish between legitimate and malicious updates. Sh in conjunction with Smid activates various quality upgrades. When Smid>Sh is a valid update, q is positive. When SmidSh is a malicious update, q is negative. The weights of harmful and legitimate updates are balanced by σ1,σ2. Additionally, to suppress malicious updates, 0<σ1<σ2. Algorithm 1 describes the algorithm for Quality score calculation.

3.3.2. Reputation calculation

Reputation is utilised to gauge a node's reliability and forms the cornerstone of the committee consensus in this paper. To determine the reputation value of the nodes in FL, a reputation evaluation algorithm is introduced. For each node, the system initialises the reputation value to R=1. The weighted average index number method is used to integrate the previous reputation and the impact of current interaction (quality score), taking into account that the reputation value should be evaluated objectively and dynamically. The reputation value of a node in the tth round is calculated as follows: (6) Rt=βRt1+(1β)qt(6) where Rt1 stands for the reputation value of a node in the previous round. β regulates the weight of past reputation and the direct interaction effect. Algorithm 2 describes the algorithm for Quality-based reputation assessment. The formula is explained in terms of historical reputation and direct interaction effects, respectively.

  • Historical reputation: Considering the impact of time on the computation of reputation values, the reference relevance of the quality score reduces with the number of round iterations. β can be regarded as a time decay factor as follows: (7) Rt=βRt1+(1β)qt=(1β)qt+β[(1β)qt1+βRt2]=(1β)qt+(1β)βqt1+(1β)β2qt2+=(1β)k=1tβtkqk(7) where βtk is a function that decays exponentially. As a consequence, the reputation value of round t is the sum of the quality score of each round multiplied by the time-varying weights. So, updates farther away from the current are assigned smaller weights than updates closer to the current, which is the other way around.

  • Direct interaction effect: The direct interaction effect is the quality score of the model parameters uploaded by the training node in this round. The aforementioned equation clearly shows that the model's reputation updates are more intense the worse the model's quality. Thus, nodes that send fraudulent updates see their reputation swiftly decline.

In addition, the reputation value R is approximately equal to the average of 1/(1β) round quality scores. The reputation curve is smoother when β is larger since more rounds' quality scores have been averaged. The reputation value will cause a delay and will adapt more slowly because is calculated over a broader time span. The findings are more susceptible to the effects of the most recent changes when the value of β is small, conversely. The reputation value is calculated within a short time window, it will be more "sensitive" to adaptation.

3.4. Reputation-based consensus method

The reputation-based consensus method is proposed in this paper, considering that conventional consensus methods consume a significant amount of computational and communication resources (Liao & Cheng, Citation2023; Oliveira et al., Citation2020). The method dynamically categorises node types based on reputation levels, with each node performing its associated operation according to its role. This role-switching strategy, which relies on reputation feedback, minimises the chance of malicious nodes mixing into the committee nodes. The security of the consensus process is guaranteed by sending only messages to committee nodes instead of all nodes, reducing communication consumption. In addition, identification and filtration of malicious updates depend on the model's quality, preventing disruptive behaviour from malicious nodes during the training process. The following is the FL workflow after the introduction of committee consensus.

3.4.1. Initialisation

The creation block is published on the blockchain by the task publisher, and the data in the creation block is used to initialise the nodes. The creation block contains the initial model state, the preset number of iteration rounds, the number of valid updates required for each round, etc. For each node, the reputation value R=1 is initialised.

Additionally, each node generates a set of asymmetric keys (sk,pk). The public key pk is used to identify the node and address it, and the private key sk is used to digitally sign transactions in the network.

3.4.2. Reputation-based node segmentation

All nodes in the system are divided into training node P={p1,p2,p3,,pc} and committee node M={m1,m2,m3,,ml}(chosen at random in the first round) in each training round based on reputation values. According to 3.3, a node's reputation indicates how active and reliable it is during the training process. Only nodes that consistently exhibit good behaviour will secure a high reputation. Therefore, nodes with the highest reputation values are selected as the committee nodes to lessen the likelihood of malicious nodes mixing into the committee. As the training nodes, the remaining nodes are employed. The leader node, which balances security and overhead, is the committee node with the highest quality score.

Among them, the training node is in the position of downloading the global model, running the model through training using local data, and uploading the generated local model parameters to the committee node. The committee node is responsible for receiving local updates, validating them, and evaluating them. The assessment results are aggregated by the leader node, and a global model is generated based on the results.

3.4.3. Gradient validation

The steps of gradient validation are as follows:

Step 1: The training nodes upload the generated local model updates to the bound committee.

The signed local models are sent as transactions from the training nodes to the bound committee nodes. The committee nodes collect the updates from the training nodes while sharing within them to get all the updates.

Step 2: The committee nodes score the updates.

The committee nodes first verify the legitimacy of the sender's digital signature, and if the verification is passed, a quality assessment of the model parameters is performed. The details of the evaluation are as follows:

In the tth training round, each committee node need to perform legal local learning of an epoch first and compute Lm(Ftm(1)). For the collected updates FtP(k)={Ftp1(k)Ftp2(k)}, the local dataset is used to validate and score these updates SPm={Sp1m,Sp2m}={Lm(Ftm(1))Lm(Ftp1(k)),Lm(Ftm(1))Lm(Ftp2(k))}. Each committee node sends the local updates FtP(k) and the evaluation result S to the leader node in the form of a transaction.

Step 3: The leader node verifies the updates.

The leader node gathers the updates as well as the evaluation results, first checking the legitimacy of the sender's digital signature. If the check is successful, the evaluation results from the committees are combined to verify if the update qualifies.

The median is utilised as the final evaluation score Smid for each update with committee assessment scores to prevent potential malevolent committees from interfering with the validation. Smid is contrasted with a threshold value Sh to distinguish between malicious and qualifying updates. Malicious updates with SmidSh are removed from the training.

3.4.4. Candidate block validation

For each update, the leader node calculates its quality score. If the update is malicious, the quality score is negative (σ2(SmidSh)), and if the update is honest, the quality score is positive (σ1(SmidSh)). Reputation and incentives are calculated for each update based on its quality score.

After verifying a predetermined number of qualified updates, the leader node aggregates to generate the global model. In Figure , the block details are shown. Along with linking the previous block's hash values, the leader also packages the local updates and their quality scores, the global model, the reputation and incentive updates to the block, and sends a consistent message to the committee nodes only. The block was confirmed as a valid block after a supermajority of committee members voted to approve it.

3.4.5. Global training

Since only a certain number of qualified updates are needed to trigger aggregation for each round of training, the training nodes have the flexibility to download the latest global model and send the updates to the committee nodes. The committee nodes assess the quality of these updates and send their evaluations to the leader node. The leader filters the malicious updates and aggregates the qualified updates to generate the global model. Each node's reputation and incentive are computed. Nodes' roles are reassigned according to their reputation values, with committee members being not eligible for re-election. The next round of training starts until the model converges or reaches a preset number of rounds.

3.5. Contribution and reputation-based incentives

Nodes contributing to the system should be rewarded appropriately because the training process in federated learning consumes the corresponding resources of nodes, which will encourage more trustworthy nodes to participate in the training task with their high-quality data. Likewise, if nodes who execute malicious/inert voting and upload harmful updates are not punished, they will not be prevented from continuing to occupy resources and obstruct the learning process. As a consequence, this part evaluates each character's utility within the system to determine their contribution. The contribution and reputation are used as metrics to assign incentive shares to nodes and give fair rewards or penalties to each role node in the system.

3.5.1. Contribution

All nodes compete with the others in the distribution of rewards in a decentralised federation learning task. A node's portion of the reward is directly proportional to the contribution it produces. Measuring the contribution of a node is the basis for reward distribution. The contribution metre gauges a node's utility to the system during each round.

Nodes in this paper are separated into committee nodes and training nodes. By measuring the contributions made by the nodes with different roles in the system, the contribution of the node i in each round is defined as follows: (8) W={qtiqtj>0qtj,iis the training nodecichcj>ch(cjch),i is the committee node(8) Where the sum quality score of all training nodes that provide truthful updates in each round is represented by qtj>0qtj. ci denotes the number of judgments that committee nodes correctly made in each round. ch is the threshold value for the number of accurate decisions made by committee nodes. cj>ch(cjch)is the total number of valid judgments made by all committee nodes in each round, subject to the threshold value.

That is, the quality of the models the training nodes submit determines how much they contribute in each round. The larger the share of its quality score (qi) in the total quality score obtained from all honest updates (qtj>0qtj), the greater the contribution of that training node to the system.

In each iteration, the committee nodes score the updates, and these scores represent the judgments made by committees. Thus the committee nodes' contribution is based on how many valid judgments they render. The larger the number of valid judgments (cich) is in the total number of valid judgments (cj>ch(cjch)), the greater the contribution of that committee node to the system.

3.5.2. Incentive calculation

This paper allocates incentives while taking reputation and contribution into account. In each round, the reputation value of a node and the contribution it makes should be proportional to the reward share. Algorithm 3 describes the algorithm for Contribution and reputation-based incentives. Define the incentive received by node i in each round as follows: (9) I=WR(9) From Equation (Equation8) above, the incentive of node i in each round is expressed as follows: (10) I={qtiqtj>0qtjR,iis the training nodecichcj>ch(cjch)R,iis the committee node(10)

When i is a training node, the reward or punishment for that node depends on whether its quality score, qti is positive or negative. If the training nodes contribute high-quality models, they may earn greater rewards. Similar to this, when i is a committee node, the reward or punishment for that node depends on whether it has more or fewer valid judgments cich. Honest update verification increases the incentive for the committee nodes.

Particularly, using the reputation value R as the control factor prevents nodes from engaging in malevolent behaviour after improving their reputation while also allowing the nodes who are consistently reliable to obtain higher rewards.

For example, if there exists a committee node for arbitrary scoring to interfere with model validation, cich is negative. And the weight (cich)/(cj>ch(cjch)) increases in direct proportion to the degree of malice. That is, nodes having a high reputation value will incur a heavier penalty WR for their bad actions. The problem of malicious nodes mixing into the committee is effectively counteracted.

To encourage honest nodes to engage more fully in the federated learning task, R, a coefficient that varies with time, increases incentives for active nodes with the same W.

After considering reputation, contribution, and cost together, the benefit of node i is defined as: (11) U=IE(11) where E is the cost consumed by node i in each round of the task. It usually includes computation costs, communication costs, and other costs. Among other costs may be the fee consumed by the node to download the global model, the resources consumed by storage, etc.

The benefits obtained by nodes are bounded by incentives and costs. One of the most straightforward ways for nodes to get more efficient benefits is to receive higher incentives. Based on quality and reputation, the above proposed incentive method can allocate appropriate incentives to nodes with different roles in the system. As a result, the active participation of honest nodes in the system in the training task is promoted. At the same time, malicious nodes are resisted, which is because the benefits of malicious nodes are equally limited by costs and incentives, and are penalised by incentives in addition to the costs consumed in each training cycle. In particular, nodes that operate maliciously after accumulating a high reputation will be penalised at a higher rate, and due to the high deception expenses, malicious nodes cannot achieve their goals.

4. Experiment

4.1. Experimental setup

This training task is executed on Geforce RTX 2080Ti GPU and simulated using the federated chain fisco-bcos in combination with TensorFlow. According to the instructions 100 nodes were simulated to participate in the FL task, which exist as peers in the blockchain network. The maximum size of the transaction was set to 512kB and the maximum size of the block was set to 10M. Each node reads a number of data from the prepared data set in each round of the task. To illustrate the adaptability of the proposed framework in the case of Non-IID datasets. We adopted a Dirichlet distribution with a hyperparameter α=1 to generate Non-IID data for 100 participating peers.

4.1.1. Datasets and models

We tested the proposed method on two datasets (Table ).

Table 3. Datasets and models.

MINIST: It contains 70 K handwritten digit images from 0 to 9. The images are divided into a training set (60 K examples) and a testing set (10 K examples).

CIFAR10: It consists of 60 K coloured images of 10 different classes. The data set is divided into 50 K training examples and 10 K testing examples.

AlexNet and ResNet18 are used as models to validate the proposed framework.

4.1.2. Baseline

We compare our proposed framework with classic baselines FedAvg (McMahan et al., Citation2017), and BFLC (Li et al., Citation2021) to examine method effectiveness under the Non-IID setting.

4.1.3. Attack setting

It is assumed that malicious participants mislead the training model by modifying the labels of the training samples to carry out a poisoning attack. The range of the attack is the proportion of malicious participants in the total nodes (a%). At the start of each experiment, a% of participants are randomly designated as malicious. The rest are honest.

4.2. Analysis of experimental results

4.2.1. Reputation and incentive performance

To verify the effectiveness of the reputation method, the reputation changes due to the same behaviour made by the training nodes involved in the federation learning task are tracked in different scenarios. The training node is configured to upload honest updates first. And then upload malicious updates in the sixth round to disrupt the federation learning task. It resumes uploading honest updates in the ninth round. Its reputation value changes as shown in Figure . FedAVG has a growing reputation value as a baseline method because it lacks defense mechanisms and reputation systems. BFLC controls the fixed increase/decrease of reputation value only by distinguishing between honest/malicious behaviour.

Figure 2. Change in reputation value when nodes perform the same behaviour in different approaches.

Figure 2. Change in reputation value when nodes perform the same behaviour in different approaches.

Figure 3. Reputation variation of different nodes in this method (a) Reputation variation of honest nodes with different qualities (b) Reputation variation of malicious nodes with different classes.

Figure 3. Reputation variation of different nodes in this method (a) Reputation variation of honest nodes with different qualities (b) Reputation variation of malicious nodes with different classes.

In the initial five rounds of the honesty campaign, nodes' reputation values start to rise. In comparison to the constant reputation updates of BFLC and FedAvg, our method correlates the reputation values of the nodes positively with the quality of the model. When a node performs malicious activity, the reputation values of both BFLC and our method start to drop, and the reputation values of our method drop faster than the BFLC. This is due to the introduction of the reputation assessment algorithm that grants greater weight to malicious updates (3.3.1). Therefore, the lower the quality of the malicious updates, the faster the reputation value drops.

When the node starts acting honorably again, BFLC's reputation value quickly recovers. In our method, reputation values rebound slowly, as our assessment integrates the effects of historical behaviour and direct interaction effects (model quality). The node's reputation builds up considerably more slowly than the pace at which reputation is lost when an attack is launched, even if it restarts providing honest updates in later rounds. Consequently, only nodes with consistent positive behaviour become high-reputation nodes, reducing the likelihood of the committee being contaminated by malicious nodes.

Additionally, our method compares the reputation and incentives of nodes with different qualities. We set up 12 nodes, of which 8 nodes are honest nodes and 4 nodes are malicious nodes. The honest nodes are categorised into four levels: L1, L2, L3, and L4, whose level is proportional to the amount of data read (L1-L4 read 40%, 30%, 20%, and 10% of the dataset, respectively, and two nodes are included in each level). Malicious nodes are categorised into two classes: the general malicious node and the malicious node that launches intermittent attacks (two nodes are included in each class). For display convenience, a group of honest and malicious nodes (one node in each level, one node in each class) is selected for tracking, and their corresponding reputation value changes are shown in Figure (a)(b). Their incentive value changes are shown in Figure .

Figure 4. Incentive for different nodes.

Figure 4. Incentive for different nodes.

Figure (a) shows that the reputation of honest nodes accumulates gradually and the reputation of high-quality nodes generally surpasses that of low-quality nodes. The direct interaction effect has an impact on reputation, which does not converge to a fixed number. Occasional decreases in reputation occur when a node is chosen to serve on a committee, as committee nodes are not assigned quality scores (3.3.1). For instance, a very competent, honest node L1 can quickly advance in reputation and join the committee. In the fifth round, the node is chosen for the committee, and in the sixth round, its reputation value drops. Subsequent reputation changes repeat the above process.

As a result, the reputation of honest nodes realises fluctuations within their credible range, while high-quality nodes have a better overall reputation than low-quality nodes. This fluctuation solves the problem of reputation monopolisation to some extent as committee nodes are elected by highly reputable nodes. It can prevent targeted attacks by malicious nodes.

It should be noted that the number of committee members, as validation nodes, is much smaller than the number of training nodes, and that generally a node's contribution when acting as a committee is valued more than its contribution while taking part in the training. It demonstrates that nodes receive a stronger incentive after being chosen for the committee when combined with the findings in Figure . Since the incentive takes into account both the weight given to the node's reputation and contribution, highly-qualified (high reputation) nodes receive a higher incentive compared to low-qualified nodes.

Similarly, as can be seen in Figure (b), the reputation of a general malicious node drops to 0 very quickly, and the lower the quality of their models, the more rapidly their reputations drop and the more severe the penalties.

For malicious nodes launching intermittent attacks, their reputation decreases faster than reputation accumulation when malicious updates are uploaded. This is due to the fact that the history factor carries more weight (β=0.9) in the reputation calculation compared to the quality of interactions in the current round. Consider a particular situation, even if the malicious node that launches intermittent attacks is included in the committee by steadily building up its reputation in the eleventh iteration, being selected for the committee itself decreases its reputation, and in combination with Figure , it can be seen that being a highly reputable node committing evil faces a harsher incentive penalty. Due to the high expenses of deception, malicious nodes will not be able to achieve their goals.

The experimental results show that, despite a certain level of malicious attacks, our method can still assess worker reputation and assign fair incentives to nodes with different roles based on contribution and reputation.

4.2.2. Consensus performance

To verify consensus performance, the accuracy of the model was compared for different scenarios as node participation varies. The node's participation is set to f. The percentage of committee nodes is 40% of them. For example, when the participation (f) of nodes is 30%, there are 30 nodes participating in the federated learning task, in which the 12 nodes with the highest reputation are selected as committee nodes. In scenarios where attacks are present, the percentage of malicious nodes (a%) was 10%.

Based on the reputation feedback, high-reputation nodes (committee nodes) are selected to participate in the consensus. The local data is used by the committee nodes as the validation set. Each committee member's scores are considered when determining the final evaluation scores, which minimises evaluation errors brought on by inadequate or unequally distributed sample sets.

The experiment proved that the proposed consensus method maintains higher performance with node participation varies.

In Table , the proposed method maintains a similar performance to the baseline in the absence of attacks. However, the consensus consumption is higher in the baseline as all nodes participate in achieving consensus. This results in the complexity of the consensus algorithm being O(n2) in each round of the training task. Assuming that the number of training nodes is Pl and the number of committee nodes is Ml, T=Pl+Ml. The consensus calculation using broadcast consumption is (Pl+Ml)2. Our method, in contrast, features internal committee consensus, which can significantly cut down on the amount of communication needed between nodes. The consensus it consumes is calculated as (Pl×Ml).

Table 4. Model accuracy under different node participation (No-attack scenario).

In Table , at an attack range of 10%, our method maintains performance compared to the baseline scheme. Due to the introduction of the reputation method, also based on intra-committee consensus, compared to BFLC our method exhibits better robustness. That is, the proposed consensus method guarantees the secure aggregation of the model while reducing communication consumption.

Table 5. Model accuracy under different node participation (Under 10% attack scenario).

4.2.3. Framework performance

To verify the overall performance of the proposed framework in this paper, the node's participation is set to 20%. Based on MINIST, CIFAR10 datasets respectively, compare the accuracy of federal learning in different approaches with various proportions of malicious nodes. The attack ranges (a%) were set to 10%, 33%, and 50%, respectively. For example, when the attack range is 50%, it means that 10 out of 20 nodes are malicious.

The results are shown in Figure , and Figure , where it is clear that our method performs better when dealing with local data that is Non-IID. As the attack range is widened, the accuracy of FedAvg is severely compromised because it is a baseline approach with no defenses. Distinguishing from BFLC, which only relies on committee verification for defense, our method maintains stable and effective training performances even under 50% attacks. This is the result of the introduction of reputation assessment and reputation-based consensus method.

Figure 5. Model accuracy with attack range in different approaches on MINIST dataset. (a) under 10% attack. (b) under 33% attack. (c) under 50% attack.

Figure 5. Model accuracy with attack range in different approaches on MINIST dataset. (a) under 10% attack. (b) under 33% attack. (c) under 50% attack.

Figure 6. Model accuracy with attack range in different approaches on CIFAR10 dataset. (a) under 10% attack. (b) under 33% attack. (c) under 50% attack.

Figure 6. Model accuracy with attack range in different approaches on CIFAR10 dataset. (a) under 10% attack. (b) under 33% attack. (c) under 50% attack.

In addition, the occurrence of more than 51% of attacks is also taken into account. First, since the system is based on a decentralised blockchain network, the malicious node should have 51% of the computational resources to attack the system, the cost of which is much larger than the benefit. Second, assuming a 51% attack in the system will not have a significant impact on the accuracy of the model. The main reason for the decrease in model accuracy is that the malicious updates of the nodes are aggregated into the global model, whereas with the introduction of the committee consensus, the malicious updates are aggregated when, and only when, more than half of the seats in the committee, Ml/2, are seized by malicious nodes. However, the committee nodes are the Ml nodes with the highest reputation in the previous round, which means that the updates of these malicious committee nodes are accepted and positively evaluated by the other Ml/2 malicious nodes in the committee in the previous round, which is an infinite dependency loop (Li et al., Citation2021). Furthermore, consider an extreme case if the malicious nodes conspire to gain half of the committee seats by disguising themselves as normal nodes. This requires that these malicious nodes consistently upload high-quality updates for long-term historical reputation accumulation and the existence of more than Ml/2 malicious nodes that are simultaneously the most reputable in the current round. This is nearly impossible given the expense that a malicious node would have to pay.

According to experimental findings, our method is effective in filtering malicious updates and maintaining accuracy under a certain level of malicious attacks. It prevents malicious workers from interfering with the learning task and facilitates federated learning for secure and efficient model aggregation.

5. Conclusion

The paper focuses on how to sustain federated learning performance under a certain level of malicious attacks and provide workers with fair incentives. A blockchain-based federated learning framework is proposed, which designs a reputation assessment method using the weighted average index number formula. This assessment objectively and dynamically evaluates node credibility, avoiding the issue of reputation monopolisation, and effectively reducing the risk of malicious nodes mixing into the committee. A reputation-based consensus method is designed. After dividing the nodes based on reputation feedback, trusted nodes are selected as committee members for internal consistency verification. This effectively reduces the impact of poisoning attacks while also reducing the consumption of consensus. In turn, an efficient and reliable model aggregation is achieved. In addition, each role node is given a contribution calculation algorithm based on its utility in the system. An incentive method bounded by both reputation and contribution is proposed. It can fairly incentivize nodes across various roles (even under attack) and counteract malicious or inert voting by highly reputable nodes.

In this paper, secure aggregation and fair incentives are incorporated into a framework. Experimental results indicate that the framework ensures fair incentives for each role node in an untrusted environment, and facilitates secure and efficient aggregation for federated learning with significantly reduced consistency consumption. The framework demonstrates good performance in terms of robustness and fairness and possesses scalability.

While we reduce the communication consumption, the decentralised network assigns some of the computing tasks to the high-reputation nodes, so there will be some computational consumption. The introduction of dynamic reputation thresholds to optimise the number of personnel or the use of grouping strategies to save resource costs in federated learning scenarios with a large number of nodes will be explored in the future. Moreover, nodes' identity privacy protection needs to be improved, and data security guarantees for participants will be considered to further lower the risk of personal data leakage. We will continue to explore and advance the methodology.

Declaration of interest statement

All authors disclosed no relevant relationships.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

References

  • Alsamhi, S. H., Shvetsov, A. V., Hawbani, A., Shvetsova, S. V., Kumar, S., & Zhao, L. (2023). Survey on Federated Learning enabling indoor navigation for industry 4.0 in B5G. Future Generation Computer Systems, 148, 250–265. https://doi.org/10.1016/j.future.2023.06.001
  • Alsamhi, S. H., Almalki, F. A., Afghah, F., Hawbani, A., Shvetsov, A. V., Lee, B., & Song, H. (2022). Drones’ edge intelligence over smart environments in B5G: Blockchain and federated learning synergy. IEEE Transactions on Green Communications and Networking, 6(1), 295–312. https://doi.org/10.1109/TGCN.2021.3132561
  • Ben Saad, S., Brik, B., & Ksentini, A. (2023). Toward securing federated learning against poisoning attacks in zero touch B5G networks. IEEE Transactions on Network and Service Management, 20(2), 1612–1624. https://doi.org/10.1109/TNSM.2023.3278838
  • Chen, H, Asif, S A, Park, J, Shen, C.-C, & Bennis, M. (2021). Robust Blockchained Federated Learning with Model Validation and Proof-of-Stake Inspired Consensus. arXiv preprint arXiv:2101.03300.
  • Ferrag, M. A., & Shu, L. (2021). The performance evaluation of blockchain-based security and privacy systems for the internet of things: A tutorial. IEEE Internet of Things Journal, 8(24), 17236–17260. https://doi.org/10.1109/JIOT.2021.3078072
  • Gao, L., Li, L., Chen, Y., Xu, C., & Xu, M. (2022). FGFL: A blockchain-based fair incentive governor for Federated Learning. Journal of Parallel and Distributed Computing, 163, 283–299. https://doi.org/10.1016/j.jpdc.2022.01.019
  • Guo, W., Wang, Y., & Jiang, P. (2023). Incentive mechanism design for Federated Learning with Stackelberg game perspective in the industrial scenario. Computers & Industrial Engineering, 184, 109592. https://doi.org/10.1016/j.cie.2023.109592
  • Hasan, O., Brunie, L., & Bertino, E. (2023). Privacy-Preserving reputation systems based on blockchain and other cryptographic building blocks: A survey. ACM Computing Surveys, 55(2), 1–37. https://doi.org/10.1145/3490236
  • Huang, C., Zhao, Y., Chen, H., Wang, X., Zhang, Q., Chen, Y., Wang, H., & Lam, K.-Y. (2022). Zkrep: A privacy-preserving scheme for reputation-based blockchain system. IEEE Internet of Things Journal, 9(6), 4330–4342. https://doi.org/10.1109/JIOT.2021.3105273
  • Jiang, T., Shen, G., Guo, C., Cui, Y., & Xie, B. (2023). Bfls: Blockchain and federated learning for sharing threat detection models as cyber threat intelligence. Computer Networks, 224, 109604. https://doi.org/10.1016/j.comnet.2023.109604
  • Kang, J., Xiong, Z., Niyato, D., Zou, Y., Zhang, Y., & Guizani, M. (2020). Reliable federated learning for mobile networks. IEEE Wireless Communications, 27(2), 72–80. https://doi.org/10.1109/MWC.001.1900119
  • Kang, J., Xiong, Z., Niyato, D., Xie, S., & Zhang, J. (2019). Incentive Mechanism for Reliable Federated Learning: A Joint Optimization Approach to Combining Reputation and Contract Theory. IEEE Internet of Things Journal, 6(6), 10700–10714. http://dx.doi.org/10.1109/JIoT.6488907
  • Kim, H., Park, J., Bennis, M., & Kim, S.-L. (2020). Blockchained On-device federated learning. IEEE Communications Letters, 24(6), 1279–1283. https://doi.org/10.1109/LCOMM.2019.2921755
  • Li, Y., Chen, C., Liu, N., Huang, H., Zheng, Z., & Yan, Q. (2021). A blockchain-based decentralized federated learning framework with committee consensus. IEEE Network, 35(1), 234–241. https://doi.org/10.1109/MNET.011.2000263
  • Liao, Z., & Cheng, S. (2023). RVC: A reputation and voting based blockchain consensus mechanism for edge computingenabled IoT systems. Journal of Network and Computer Applications, 209, 103510. https://doi.org/10.1016/j.jnca.2022.103510
  • Liu, W., He, Y., Wang, X., Duan, Z., Liang, W., & Liu, Y. (2023). BFG: Privacy protection framework for internet of medical things based on blockchain and federated learning. Connection Science, 35(1), 2199951. https://doi.org/10.1080/09540091.2023.2199951
  • Madill, E., Nguyen, B., Leung, C. K., & Rouhani, S. (2022). Scalesfl: A Sharding Solution for Blockchain-Based Federated Learning. Proceedings of the Fourth ACM International Symposium on Blockchain and Secure Critical Infrastructure, 95–106. https://doi.org/10.1145/3494106.3528680.
  • McMahan, B., Moore, E., Ramage, D., Hampson, S., & Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics (pp. 1273–1282). https://proceedings.mlr.press/v54/mcmahan17a.html.
  • Myrzashova, R., Alsamhi, S. H., Shvetsov, A. V., Hawbani, A., & Wei, X. (2023). Blockchain meets federated learning in healthcare: A systematic review With challenges and opportunities. IEEE Internet of Things Journal, 10(16), 14418–14437. https://doi.org/10.1109/JIOT.2023.3263598
  • Oliveira, M. T. d., Reis, L. H. A., Medeiros, D. S. V., Carrano, R. C., Olabarriaga, S. D., & Mattos, D. M. F. (2020). Blockchain reputation-based consensus: A scalable and resilient mechanism for distributed mistrusting applications. Computer Networks, 179, 107367. https://doi.org/10.1016/j.comnet.2020.107367
  • Peng, Z., Xu, J., Chu, X., Gao, S., Yao, Y., Gu, R., & Tang, Y. (2022). Vfchain: Enabling verifiable and auditable federated learning via blockchain systems. IEEE Transactions on Network Science and Engineering, 9(1), 173–186. https://doi.org/10.1109/TNSE.2021.3050781
  • Qi, J., Lin, F., Chen, Z., Tang, C., Jia, R., & Li, M. (2022). High-Quality model aggregation for blockchain-based federated learning via reputation-motivated task participation. IEEE Internet of Things Journal, 9(19), 18378–18391. https://doi.org/10.1109/JIOT.2022.3160425
  • Qin, Z., Ye, J., Meng, J., Lu, B., & Wang, L. (2022). Privacy-Preserving blockchain-based federated learning for marine internet of things. IEEE Transactions on Computational Social Systems, 9(1), 159–173. https://doi.org/10.1109/TCSS.2021.3100258
  • Qu, Y., Gao, L., Xiang, Y., Shen, S., & Yu, S. (2022). Fedtwin: Blockchain-enabled adaptive asynchronous federated learning for digital twin networks. IEEE Network, 36(6), 183–190. https://doi.org/10.1109/MNET.105.2100620
  • Shayan, M., Fung, C., Yoon, C. J. M., & Beschastnikh, I. (2021). Biscotti: A blockchain system for private and secure federated learning. IEEE Transactions on Parallel and Distributed Systems, 32(7), 1513–1525. https://doi.org/10.1109/TPDS.2020.3044223
  • Singh, S., Rathore, S., Alfarraj, O., Tolba, A., & Yoon, B. (2022). A framework for privacy-preservation of IoT healthcare data using Federated Learning and blockchain technology. Future Generation Computer Systems, 129, 380–388. https://doi.org/10.1016/j.future.2021.11.028
  • Song, Z., Sun, H., Yang, H. H., Wang, X., Zhang, Y., & Quek, T. Q. S. (2022). Reputation-Based federated learning for secure wireless networks. IEEE Internet of Things Journal, 9(2), 1212–1226. https://doi.org/10.1109/JIOT.2021.3079104
  • Wang, Z., Hu, Q., Li, R., Xu, M., & Xiong, Z. (2022). Incentive Mechanism Design for Joint Resource Allocation in Blockchain-based Federated Learning (arXiv:2202.10938). arXiv. http://arxiv.org/abs/2202.10938.
  • Xu, M., & Li, X. (2023). FedG2L: A privacy-preserving federated learning scheme base on “G2L” against poisoning attack. Connection Science, 35(1), 2197173. https://doi.org/10.1080/09540091.2023.2197173
  • Xu, R., & Chen, Y. (2022). Mdfl: A secure microchained decentralized federated learning fabric atop IoT networks. IEEE Transactions on Network and Service Management, 19(3), 2677–2688. https://doi.org/10.1109/TNSM.2022.3179892
  • Yu, H., Liu, Z., Liu, Y., Chen, T., Cong, M., Weng, X., Niyato, D., & Yang, Q. (2020). A fairness-aware incentive scheme for federated learning. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 393–399. https://doi.org/10.1145/3375627.3375840
  • Zhang, K., Song, X., Zhang, C., & Yu, S. (2022). Challenges and future directions of secure federated learning: A survey. Frontiers of Computer Science, 16(5), 165817. https://doi.org/10.1007/s11704-021-0598-z
  • Zhang, B., Wang, X., Xie, R., Li, C., Zhang, H., & Jiang, F. (2023). A reputation mechanism based Deep Reinforcement Learning and blockchain to suppress selfish node attack motivation in Vehicular Ad-Hoc Network. Future Generation Computer Systems, 139, 17–28. https://doi.org/10.1016/j.future.2022.09.010
  • Zhang, S., & Zhu, J. (2023). Privacy protection federated learning framework based on blockchain and committee consensus in IoT devices. 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC), 627–636. https://doi.org/10.1109/COMPSAC57700.2023.00088
  • Zhang, T., Gao, L., He, C., Zhang, M., Krishnamachari, B., & Avestimehr, S. (2022). Federated Learning for Internet of Things: Applications, Challenges, and Opportunities. (arXiv:2111.07494). arXiv. http://arxiv.org/abs/2111.07494.
  • Zhu, J., Cao, J., Saxena, D., Jiang, S., & Ferradi, H. (2023). Blockchain-empowered federated learning: Challenges, solutions, and future directions. ACM Computing Surveys, 55(11), 1–31. https://doi.org/10.1145/3570953