Full article: A modified projective forward-backward splitting algorithm for variational inclusion problems to predict Parkinson's disease

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

This research studies variational inclusion problems, which is a branch of optimization. A modified projective forward-backward splitting algorithm is constructed to solve this problem. The algorithm adds the inertial technique for speeding up the convergence, and the projective method for several regularization machine learning models to meet good model fitting. To evaluate the performance of the classification models employed in this research, four evaluation metrics are computed: accuracy, F1-score, recall, and precision. The highest performance value of 92.86% accuracy, 62.50% precision, 100% recall, and 76.92% F1-score shows that our algorithm performs better than the other machine learning models.

Keywords:

Mathematics Subject Classifications:

1. Introduction

The variational inclusion problem (VIP) is to find a point $ω^{*}$ in a real Hilbert space $H$ such that (1) $0 \in (F + G) ω^{*},$ (1) where $F : H \to H$ is a single valued mapping and $G : H \to 2^{H}$ is a multivalued mapping. Solving the VIP (Equation1(1) $0 \in (F + G) ω^{*},$ (1) ) has many benefits for specific applications since many problems can be formulated in the form of the VIP (Equation1(1) $0 \in (F + G) ω^{*},$ (1) ), such as minimization problems, machine learning, image processing, signal processing, etc., see in [Citation1–3]. This paper focuses on modifying efficient algorithms to solve the VIP (Equation1(1) $0 \in (F + G) ω^{*},$ (1) ) and applying them to decrypt data classification. The inertial extrapolation technique, which was introduced in 1964, is one of the techniques for ensuring the algorithm's good convergence [Citation4]. The main feature of the inertia extrapolation technique is that the next iterate is constructed by the previous two iterates. This technique was used extensively with variational inclusion algorithms later by many authors, see in [Citation5–9]. The inertial forward-backward algorithm (IFB) is the original variational inclusion algorithm introduced by Moudafi and Oliny [Citation10]. The algorithm was generated by $ω^{0}, ω^{1} \in H$ and (2) $\begin{aligned} y^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ ω^{k + 1} & = J_{τ^{k}}^{G} (y^{k} - τ^{k} F ω^{k}), k \geq 0, \end{aligned}$ (2) where ${τ^{k}}$ is a sequence of positive real numbers. The weak convergence of the iterative sequence was established based on the condition generated in terms of the sequence ${ω^{k}}$ and parameter $σ^{k}$ under a co-coercivity condition $F$ about the solution set. Following the idea of Moudafi and Oliny [Citation10], Peeyada et al. [Citation11] introduced an inertial Mann forward-backward splitting algorithm (IMFB) for the VIP (Equation1(1) $0 \in (F + G) ω^{*},$ (1) ) by taking $ω^{0}, ω^{1} \in H$ and (3) $\begin{aligned} y^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ z^{k} & = y^{k} + α^{k} (ω^{k} - y^{k}), \\ ω^{k + 1} & = J_{τ^{k}}^{G} (z^{k} - τ^{k} F z^{k}), k \geq 0, \end{aligned}$ (3) where ${τ^{k}}$ is a sequence of positive real numbers and ${α^{k}} \subset [0, 1]$ . Under the same conditions on a co-coercivity condition $F$ , the parameters $α^{k}, τ^{k}$ , and $\sum_{k = 1}^{\infty} σ^{k} ‖ ω^{k} - ω^{k - 1} ‖ < \infty$ , weak convergence of algorithm was proved. The IMFB was used in machine learning for breast cancer classification, and its efficiency was presented by comparing it with other algorithms. The regularized least square model was solved by setting the VIP (Equation1(1) $0 \in (F + G) ω^{*},$ (1) ) for getting an optimal fitting model in machine learning, thus the IMFB is very useful for breast cancer classification. Many optimization branches have been used in various neural networks and medical fields in recent years [Citation12–16]. Parkinson's disease is a gradual degenerative disease of the brain that is common in the elderly. There is currently no cure for this disease, although early detection and treatment by a medical practitioner can help reduce the progression of the disease and improve the quality of life. Many machine learning methods such as Bayesian optimization (BO), Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR), etc. (see in [Citation17]) have been used for Parkinson's disease detection. The extreme learning machine (ELM) is also used in Parkinson's disease detection (see in [Citation18]), with the aid of the feature selection techniques the method got an efficient model.

To solve the VIP (Equation1(1) $0 \in (F + G) ω^{*},$ (1) ), we present a modified inertial two-step Mann forward-backward splitting algorithm with projective methods inspired by prior research. Weak convergence is demonstrated under appropriate conditions to confirm solution convergence. In the final section, we show how our algorithm can be used to Parkinson's disease detection by the ELM without data cleaning technique and compare it to various machine learning methods.

2. Preliminaries

In this section, we provide various definitions and lemmas that will be needed to prove our optimization algorithm in Section 3. We use the symbols $⇀$ and $\to$ to denote weak and strong convergence, respectively.

Definition 2.1

$F : H \to H$ is called $L$ -Lipschitz continuous if there exists $L > 0$ such that $‖ F ω - F ν ‖ \leq L ‖ ω - ν ‖$ for all $ω, ν \in H$ . $F$ is nonexpansive mapping for $L = 1$ .

Definition 2.2

Assume that $G : H \to 2^{H}$ is a multivalued mapping and its graph mapping is denoted by $graph (G)$ . $G$ is called

(i)	monotone if ∀ $(ω, u), (ν, v) \in graph (G)$ , $〈 u - v, ω - ν 〉 \geq 0,$
(ii)	maximal monotone if ∀ $(ω, u) \in H \times H$ , $〈 u - v, ω - ν 〉 \geq 0$ ∀ $(ν, v) \in graph (G)$ ⇔ $(ω, u) \in graph (G)$ .
(iii)	α-inverse strongly monotone if ∃ $α > 0$ such that $〈 G x - G y, x - y 〉 \geq α ‖ G x - G y ‖^{2}$ ∀ $x, y \in H$ .

Lemma 2.3

[Citation19]

Assume $F : H \to H$ is a mapping and that $G : H \to 2^{H}$ is a maximal monotone mapping. Let $T_{τ} := (I + τ G)^{- 1} (I - τ F)$ , $τ > 0$ , and $Fix (T_{τ})$ be the set of the fixed points of $T_{τ}$ . Then $Fix (T_{τ}) = (F + G)^{- 1} (0)$ .

Lemma 2.4

[Citation20]

Assume $G : H \to 2^{H}$ is a maximal monotone mapping and that $F : H \to H$ is a Lipschitz continuous and monotone mapping. Then the mapping $F + G$ is maximal monotone.

Lemma 2.5

[Citation21]

Assume $F : H \to H$ is α-inverse-strongly monotone mapping. Then

$F$ is $\frac{1}{α}$ -Lipschitz continuous and monotone.
If $τ \in (0, 2 α]$ is a constant, then $I - τ F$ is nonexpansive, where I is the identity mapping on $H$ .

Lemma 2.6

[Citation22]

Assume $F : H \to H$ is a α-inverse-strongly monotone mapping and that $G : H \to 2^{H}$ is maximal monotone mapping. Then

For $τ > 0$ , $Fix (J_{τ}^{G} (I - τ F)) = (F + G)^{- 1} (0)$ ,
For $0 < τ \leq \bar{τ}$ and $x \in H$ , $‖ x - J_{τ}^{G} (I - τ F) x ‖ \leq 2 ‖ x - J_{\bar{τ}}^{G} (I - \bar{τ} F) x ‖$ .

Lemma 2.7

[Citation23]

Assume $F : H \to H$ is a nonexpansive mapping with $Fix (F) \neq \emptyset$ . If there exists a sequence ${ω^{k}}$ in $H$ with $ω^{k} ⇀ ω \in H$ and $‖ ω^{k} - F ω^{k} ‖ \to 0$ , then $ω \in Fix (F)$ .

Lemma 2.8

[Citation24]

Assume that ${α^{k}}$ and ${β^{k}}$ are nonnegative real sequences with $\sum_{n = 1}^{\infty} β^{k} < \infty$ and $α^{k + 1} \leq α^{k} + β^{k} .$ Then, ${α^{k}}$ is convergent.

Lemma 2.9

[Citation19, Opial]

Assume that $C$ is a nonempty set of $H$ and ${ω^{k}}$ is a sequence in $H$ . Assume that the following are true.

${‖ ω^{k} - ω ‖}$ converges, ∀ $ω \in C$ .
Every weak sequential cluster point of ${ω^{k}}$ is in $C$ .
Then ${ω^{k}}$ converges weakly to a point in $C$ .

3. Optimization algorithm

The section aims to present the convergence analysis. Let $C$ be a nonempty closed convex subset of a real Hilbert space $H$ , and consider the following conditions.

(C1)	$G : H \to 2^{H}$ is maximal monotone mapping.
(C2)	$F : H \to H$ is α-inverse-strongly monotone mapping.
(C3)	$Ψ := (F + G)^{- 1} (0) \cap C$ is nonempty.

Remark 3.1

From Algorithm 1, we see that; (i) by setting $α^{k}, β^{k}$ , and $P_{C}$ , Algorithm 1 can be reduced to other modified forward-backward splitting algorithms, e.g. if $α^{k} = 0$ , the reduced algorithm is in the form: $\begin{aligned} z^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ y^{k} & = J_{τ^{k}}^{G} (I - τ^{k} F) z^{k}, \\ ω^{k + 1} & = P_{C} (β^{k} y^{k} + (1 - β^{k}) J_{τ^{k}}^{G} (I - τ^{k} F) y^{k}), k \geq 1; \end{aligned}$ (ii) Algorithm 1 recovers the inertial forward-backward algorithm in [Citation10] when we take $α^{k} = β^{k} = 0$ , and $C = H$ ; (iii) the difference between an inertial projective method and a standard method is an inertial projective method is more relaxing than a standard method with the inertial term which can be set to need faster convergence and the projection operator can be focussed to meet faster solution, the following structure shows the first step of the comparison between an inertial projective method and a standard method in $R^{2}$ :

From Figure , we see that $ω^{k}$ is updated to $z^{k}$ by $σ^{k} (ω^{k} - ω^{k - 1})$ every step, thus speeding up the convergence of the algorithm depends on setting the right $σ^{k}$ . $ω^{k + 1}$ (black vector) is also updated to new $ω^{k + 1}$ (e.g. orange or blue vector) by setting $C$ .

We are now ready for the main convergence theorem.

Figure 1. The structure comparison of an inertial projective method and a standard method.

Figure 2. Training-validation accuracy plots of Algorithm 1 which is considered by the RLS $L_{1}$ ELM mode.

Figure 3. Training-validation loss plots of Algorithm 1 which is considered by the RLS $L_{1}$ ELM mode.

Figure 4. Training-validation accuracy plots of Algorithm 1 which is considered by the RLS $L_{1}$ -C $L_{2}$ ELM mode.

Figure 5. Training-validation loss plots of Algorithm 1 which is considered by the RLS $L_{1}$ -C $L_{2}$ ELM mode.

Figure 6. Training-validation accuracy plots of IMFB (Equation3(3) $\begin{aligned} y^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ z^{k} & = y^{k} + α^{k} (ω^{k} - y^{k}), \\ ω^{k + 1} & = J_{τ^{k}}^{G} (z^{k} - τ^{k} F z^{k}), k \geq 0, \end{aligned}$ (3) ) which is considered by the RLS $L_{1}$ ELM mode.

Figure 6. Training-validation accuracy plots of IMFB (Equation3(3) yk=ωk+σk(ωk−ωk−1),zk=yk+αk(ωk−yk),ωk+1=JτkG(zk−τkFzk),k≥0,(3) ) which is considered by the RLSL1 ELM mode.

Theorem 3.2

Let the sequence ${ω^{k}}$ generated by Algorithm 1 satisfying the conditions $(C 1) - (C 3)$ . Then, ${ω^{k}}$ converges weakly to an element of Ψ.

Proof.

Let $ω^{*} \in Ψ$ . Sine $P_{C}$ and $J_{τ^{k}}^{G} (I - τ^{k} F)$ are nonexpansive, we have (4) $\begin{aligned} ‖ ω^{k + 1} - ω^{*} ‖ & = ‖ P_{C} (β^{k} y^{k} + (1 - β^{k}) J_{τ^{k}}^{G} (I - τ^{k} F) y^{k}) - ω^{*} ‖ \\ \leq ‖ β^{k} y^{k} + (1 - β^{k}) J_{τ^{k}}^{G} (I - τ^{k} F) y^{k} - ω^{*} ‖ \\ \leq β^{k} ‖ y^{k} - ω^{*} ‖ + (1 - β^{k}) ‖ J_{τ^{k}}^{G} (I - τ^{k} F) y^{k} - ω^{*} ‖ \\ \leq ‖ y^{k} - ω^{*} ‖ \\ \leq α^{k} ‖ z^{k} - ω^{*} ‖ + (1 - α^{k}) ‖ J_{τ^{k}}^{G} (I - τ^{k} F) z^{k} - ω^{*} ‖ \\ \leq ‖ z^{k} - ω^{*} ‖ \\ \leq ‖ ω^{k} - ω^{*} ‖ + σ^{k} ‖ ω^{k} - ω^{k - 1} ‖ . \end{aligned}$ (4) By Lemma 2.8, we obtain that $lim_{n \to \infty} ‖ ω^{k} - ω^{*} ‖$ exists. Since $‖ ω^{k + 1} - ω^{*} ‖ \leq ‖ y^{k} - ω^{*} ‖ \leq ‖ z^{k} - ω^{*} ‖ \leq ‖ ω^{k} - ω^{*} ‖ + σ^{k} ‖ ω^{k} - ω^{k - 1} ‖$ , we also obtain (5) $lim_{n \to \infty} ‖ ω^{k} - ω^{*} ‖ = lim_{n \to \infty} ‖ y^{k} - ω^{*} ‖ = lim_{n \to \infty} ‖ z^{k} - ω^{*} ‖ .$ (5) This implies that ${ω^{k}}$ is bounded and also ${z^{k}}$ and ${F z^{k}}$ are also bounded. Since $F$ is α-inverse-strongly monotone mapping by Lemma 2.5, then by (Equation4(4) $\begin{aligned} ‖ ω^{k + 1} - ω^{*} ‖ & = ‖ P_{C} (β^{k} y^{k} + (1 - β^{k}) J_{τ^{k}}^{G} (I - τ^{k} F) y^{k}) - ω^{*} ‖ \\ \leq ‖ β^{k} y^{k} + (1 - β^{k}) J_{τ^{k}}^{G} (I - τ^{k} F) y^{k} - ω^{*} ‖ \\ \leq β^{k} ‖ y^{k} - ω^{*} ‖ + (1 - β^{k}) ‖ J_{τ^{k}}^{G} (I - τ^{k} F) y^{k} - ω^{*} ‖ \\ \leq ‖ y^{k} - ω^{*} ‖ \\ \leq α^{k} ‖ z^{k} - ω^{*} ‖ + (1 - α^{k}) ‖ J_{τ^{k}}^{G} (I - τ^{k} F) z^{k} - ω^{*} ‖ \\ \leq ‖ z^{k} - ω^{*} ‖ \\ \leq ‖ ω^{k} - ω^{*} ‖ + σ^{k} ‖ ω^{k} - ω^{k - 1} ‖ . \end{aligned}$ (4) ) and $J_{τ^{k}}^{G}$ is firmly nonexpansive mapping, we have $\begin{aligned} ‖ ω^{k + 1} - ω^{*} ‖^{2} & \leq α^{k} ‖ z^{k} - ω^{*} ‖^{2} + (1 - α^{k}) ‖ J_{τ^{k}}^{G} (I - τ^{k} F) z^{k} - ω^{*} ‖^{2} \\ \leq α^{k} ‖ z^{k} - ω^{*} ‖^{2} + (1 - α^{k}) (‖ z^{k} - ω^{*} - τ^{k} (F z^{k} - F ω^{*}) ‖^{2} \\ - ‖ z^{k} - J_{τ^{k}}^{G} (I - τ^{k} F) z^{k} - τ^{k} (F z^{k} - F ω^{*}) ‖) \\ \leq α^{k} ‖ z^{k} - ω^{*} ‖^{2} + (1 - α^{k}) (‖ z^{k} - ω^{*} ‖^{2} + (τ^{k})^{2} ‖ F z^{k} - F ω^{*} ‖^{2} \\ - 2 τ^{k} 〈 z^{k} - ω^{*}, F z^{k} - F ω^{*} 〉 \\ - ‖ z^{k} - J_{τ^{k}}^{G} (I - τ^{k} F) z^{k} - τ^{k} (F z^{k} - F ω^{*}) ‖) \\ \leq ‖ z^{k} - ω^{*} ‖^{2} - (1 - α^{k}) (τ^{k} (2 α - τ^{k}) ‖ F z^{k} - F ω^{*} ‖^{2} \\ + ‖ z^{k} - J_{τ^{k}}^{G} (I - τ^{k} F) z^{k} - τ^{k} (F z^{k} - F ω^{*}) ‖) . \end{aligned}$ It follows from (Equation5(5) $lim_{n \to \infty} ‖ ω^{k} - ω^{*} ‖ = lim_{n \to \infty} ‖ y^{k} - ω^{*} ‖ = lim_{n \to \infty} ‖ z^{k} - ω^{*} ‖ .$ (5) ) and $\underset{k \to \infty}{lim sup} α^{k} < 1, 0 < \underset{k \to \infty}{lim inf} τ^{k} \leq \underset{k \to \infty}{lim sup} τ^{k} < 2 α$ and $\sum_{k = 1}^{\infty} σ^{k} ‖ ω^{k} - ω^{k - 1} ‖ < \infty$ that (6) $lim_{k \to \infty} ‖ F z^{k} - F ω^{*} ‖ = lim_{k \to \infty} ‖ z^{k} - J_{τ^{k}}^{G} (I - τ^{k} F) z^{k} ‖ = 0.$ (6) Sine $\underset{k \to \infty}{lim inf} τ^{k} > 0$ , there exists $τ > 0$ such that $0 < τ \leq τ^{k}$ . By Lemma 2.6(ii), we obtain $‖ z^{k} - J_{τ}^{G} (I - τ F) z^{k} ‖ \leq 2 ‖ z^{k} - J_{τ^{k}}^{G} (I - τ^{k} F) z^{k} ‖ .$ It follows from (Equation6(6) $lim_{k \to \infty} ‖ F z^{k} - F ω^{*} ‖ = lim_{k \to \infty} ‖ z^{k} - J_{τ^{k}}^{G} (I - τ^{k} F) z^{k} ‖ = 0.$ (6) ) that (7) $lim_{k \to \infty} ‖ z^{k} - J_{τ}^{G} (I - τ F) z^{k} ‖ = 0.$ (7) We next let $\hat{ω}$ be a weak sequential cluster point of ${ω^{k}}$ . Since $C$ is closed convex set, $\hat{ω} \in C$ . It follows from (Equation5(5) $lim_{n \to \infty} ‖ ω^{k} - ω^{*} ‖ = lim_{n \to \infty} ‖ y^{k} - ω^{*} ‖ = lim_{n \to \infty} ‖ z^{k} - ω^{*} ‖ .$ (5) ) and (Equation7(7) $lim_{k \to \infty} ‖ z^{k} - J_{τ}^{G} (I - τ F) z^{k} ‖ = 0.$ (7) ) that $\hat{ω} \in Ψ$ by Lemma 2.7. Since $lim_{k \to \infty} ‖ ω^{k} - \hat{ω} ‖$ exists, by Lemma 2.9, we obtain ${ω^{k}}$ converges weakly to $\hat{ω}$ . Theorem 3.2 is completed.

4. Data classification problem

Parkinson's disease (PD) is a non-communicable disease that is a movement disorder characterized by the progressive degeneration of dopaminergic neurones in the midbrain. Its severity level is classified as stage 1, 2, 3, and severe condition. In the medical world, it is often difficult to identify the severity of Parkinson's disease and predict the progression of the disease. The disease often occurs in individuals older than 60. Parkinson's disease is a disease that, if not adequately treated, can lead to an inability to control walking, leading to accidents. and may cause disability risk. Therefore, if it is found that a family member has symptoms related to this disease, they should be taken to see a doctor immediately for treatment and to prevent harm to the patient. Therefore, using machine learning to help analyze the likelihood that a patient will likely develop a disease is very important. In this paper, we applied our proposed Algorithm 1 in an extreme learning machine (ELM) to find the optimal weights using the PD dataset from UCI Machine Learning Repository. This dataset is available online at the well-known UCI machine learning website [Citation25] and was published in [Citation26, Citation27]. The dataset consists of 23 attributes and 195 instances which were first created in a collaboration between Oxford University and the National Centre for Voice and Speech by Max Little. The overview of the data is shown in Table .

Table 1. The overview of PD dataset from UCI Machine Learning Repository.

Download CSV Display Table

Very recently, Elshewey et al. [Citation17] presented using Bayesian optimization (BO) [Citation28] or optimizing the hyperparameters of machine learning models: Support Vector Machine (SVM) [Citation29], Random Forest (RF) [Citation30], Logistic Regression (LR) [Citation31], Naive Bayes (NB) [Citation32], Ridge Classifier (RC) [Citation33], and Decision Tree (DT) [Citation34] to obtain better accuracy by using PD dataset with min–max normalization. Table shows the different accuracy of the machine learning methods between using BO optimizing the hyperparameters and the default parameters compared with our algorithm in an extreme learning machine (ELM).

To understand ELM, which was first introduced by Huang et al. [Citation35], we let $K := {(x^{k}, b^{k}) : x^{k} \in R^{n}, b^{k} \in R^{m}, k = 1, 2, \dots, N}$ be a training set of N distinct samples where $x^{k}$ is an input training data and $b^{k}$ is a training target. The output function of ELM for single-hidden layer feed forward neural networks (SLFNs) with L hidden nodes is $O^{k} = \sum_{i = 1}^{L} ω_{i} g (a_{i} x^{k} + b_{i}),$ where g is an activation function, $a_{i}$ and $b_{i}$ are parameters of weight and finally the bias at the i-th hidden node, respectively, and $ω_{i}$ is the optimal output weight at the i-th hidden node. The hidden layer output matrix $A$ is defined as follows: $A = [\begin{array}{ccc} g (a_{1} x^{1} + b_{1}) & \dots & g (a_{L} x^{1} + b_{L}) \\ ⋮ & ⋱ & ⋮ \\ g (a_{1} x^{N} + b_{1}) & \dots & g (a_{L} x^{N} + b_{L}) \end{array}]$ The main goal of ELM is to find optimal output weight $ω = [ω_{1}, \dots, ω_{L}]^{T}$ such that $A ω = B$ , where $B = [b^{1}, \dots, b^{N}]^{T}$ is the training target dataset. In some cases, finding the exact solution of a linear equation $A ω = B$ may be difficult, therefore least square problem has been considered to approximate the solution. Also, for suitable prediction, overfitting of the model should be considered. In this paper, we avoid overfitting our model by using regularized least squares (RLS) and show the excellent fit of our model by considering accuracy and loss porting. The six regularized least squares problem models can be solved by our proposed algorithm by setting as follows: for a regularization parameter $λ > 0$ and constrained constant $β > 0$ ,

setting $F (ω) \equiv \nabla (\frac{1}{2} ‖ A ω - B ‖_{2}^{2})$ , $G (ω) \equiv \partial (λ ‖ ω ‖_{1})$ , $C = H$ in regularized least squares by $L_{1}$ (RLS $L_{1}$ ) (8) $min_{ω \in R^{L}} {\frac{1}{2} ‖ A ω - B ‖_{2}^{2} + λ ‖ ω ‖_{1}};$ (8)
setting $F (ω) \equiv \nabla (\frac{1}{2} ‖ A ω - B ‖_{2}^{2})$ , $G (ω) \equiv \partial (λ ‖ ω ‖_{2}^{2})$ , $C = H$ in regularized least squares by $L_{2}$ (RLS $L_{2}$ ) (9) $min_{ω \in R^{L}} {\frac{1}{2} ‖ A ω - B ‖_{2}^{2} + λ ‖ ω ‖_{2}^{2}};$ (9)
setting $F (ω) \equiv \nabla (\frac{1}{2} ‖ A ω - B ‖_{2}^{2})$ , $G (ω) \equiv \partial (λ ‖ ω ‖_{1})$ , $C = {ω : ‖ ω ‖_{1} \leq β}$ in regularized least squares by $L_{1}$ on constrained set $C$ (RLS $L_{1}$ -C $L_{1}$ ) (10) $min_{ω \in C} {\frac{1}{2} ‖ A ω - B ‖_{2}^{2} + λ ‖ ω ‖_{1}};$ (10)
setting $F (ω) \equiv \nabla (\frac{1}{2} ‖ A ω - B ‖_{2}^{2})$ , $G (ω) \equiv \partial (λ ‖ ω ‖_{1})$ , $C = {ω : ‖ ω ‖_{2}^{2} \leq β}$ in regularized least squares by $L_{1}$ on constrained set $C$ (RLS $L_{1}$ -C $L_{2}$ ) (11) $min_{ω \in C} {\frac{1}{2} ‖ A ω - B ‖_{2}^{2} + λ ‖ ω ‖_{1}};$ (11)
setting $F (ω) \equiv \nabla (\frac{1}{2} ‖ A ω - B ‖_{2}^{2})$ , $G (ω) \equiv \partial (λ ‖ ω ‖_{2}^{2})$ , $C = {ω : ‖ ω ‖_{1} \leq β}$ in regularized least squares by $L_{2}$ on constrained set $C$ (RLS $L_{2}$ -C $L_{1}$ ) (12) $min_{ω \in C} {\frac{1}{2} ‖ A ω - B ‖_{2}^{2} + λ ‖ ω ‖_{2}^{2}};$ (12)
setting $F (ω) \equiv \nabla (\frac{1}{2} ‖ A ω - B ‖_{2}^{2})$ , $G (ω) \equiv \partial (λ ‖ ω ‖_{2}^{2})$ , $C = {ω : ‖ ω ‖_{2}^{2} \leq β}$ in regularized least squares by $L_{2}$ on constrained set $C$ (RLS $L_{2}$ -C $L_{2}$ ) (13) $min_{ω \in C} {\frac{1}{2} ‖ A ω - B ‖_{2}^{2} + λ ‖ ω ‖_{2}^{2}} .$ (13)

Table 2. The accuracy comparison between Elshewey et al. [Citation17] methods and ours.

Download CSV Display Table

Remark 4.1

The novelty of Algorithm 1 is applied to solve regularization problem models on constrained sets in many directions (Equation10(10) $min_{ω \in C} {\frac{1}{2} ‖ A ω - B ‖_{2}^{2} + λ ‖ ω ‖_{1}};$ (10) )–(Equation13(13) $min_{ω \in C} {\frac{1}{2} ‖ A ω - B ‖_{2}^{2} + λ ‖ ω ‖_{2}^{2}} .$ (13) ) which is different from algorithms in the literature and achieves good results.

The binary cross entropy loss function calculates the loss of an example by computing the following average: $Loss = - \frac{1}{output size} \sum_{k = 1}^{output size} O^{k} \log {\hat{O}}^{k} + (1 - O^{k}) \log (1 - {\hat{O}}^{k})$ where ${\hat{O}}^{k}$ is the k-th scalar value in the model output, $O^{k}$ is the corresponding target value, and the output size is the number of scalar values in the model output.

In this work, we present four measures: Accuracy, Recall, Precision, and F1-score for the performance reports. The formulations of three measures are defined as follows: $\begin{aligned} Precision (Pre) & = \frac{TP}{TP + FP} \times 100 % . \\ Recall (Rec) & = \frac{TP}{TP + FN} \times 100 % . \\ Accuracy (Acc) & = \frac{TP + TN}{TP + FP + TN + FN} \times 100 %, \\ F 1 - score & = \frac{2 \times (Precision \times Recall)}{Precision + Recall}, \end{aligned}$ where $TN$ :=True Negative, $FP$ := False Positive, $FN$ :=False Negative, and $TP$ :=True Positive.

For comparison experiments, optimization models (Equation8(8) $min_{ω \in R^{L}} {\frac{1}{2} ‖ A ω - B ‖_{2}^{2} + λ ‖ ω ‖_{1}};$ (8) ) and (Equation9(9) $min_{ω \in R^{L}} {\frac{1}{2} ‖ A ω - B ‖_{2}^{2} + λ ‖ ω ‖_{2}^{2}};$ (9) ) are considered for IFB (Equation2(2) $\begin{aligned} y^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ ω^{k + 1} & = J_{τ^{k}}^{G} (y^{k} - τ^{k} F ω^{k}), k \geq 0, \end{aligned}$ (2) ) and IMFB (Equation3(3) $\begin{aligned} y^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ z^{k} & = y^{k} + α^{k} (ω^{k} - y^{k}), \\ ω^{k + 1} & = J_{τ^{k}}^{G} (z^{k} - τ^{k} F z^{k}), k \geq 0, \end{aligned}$ (3) ) algorithms. The necessary parameters which are used in our Algorithm 1, IFB (Equation2(2) $\begin{aligned} y^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ ω^{k + 1} & = J_{τ^{k}}^{G} (y^{k} - τ^{k} F ω^{k}), k \geq 0, \end{aligned}$ (2) ) and IMFB (Equation3(3) $\begin{aligned} y^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ z^{k} & = y^{k} + α^{k} (ω^{k} - y^{k}), \\ ω^{k + 1} & = J_{τ^{k}}^{G} (z^{k} - τ^{k} F z^{k}), k \geq 0, \end{aligned}$ (3) ) algorithms can be seen in Table where $σ^{k} = {\begin{cases} \frac{σ}{k^{2} ‖ ω^{k} - ω^{k - 1}} ‖ & if k > K and ω^{k} \neq ω^{k - 1}, \\ σ & otherwise, \end{cases}$ where K is a number of iterations that we want to stop.

Table 3. The necessary parameters for our Algorithm 1, IFB (Equation2(2) $\begin{aligned} y^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ ω^{k + 1} & = J_{τ^{k}}^{G} (y^{k} - τ^{k} F ω^{k}), k \geq 0, \end{aligned}$ (2) ) and IMFB (Equation3(3) $\begin{aligned} y^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ z^{k} & = y^{k} + α^{k} (ω^{k} - y^{k}), \\ ω^{k + 1} & = J_{τ^{k}}^{G} (z^{k} - τ^{k} F z^{k}), k \geq 0, \end{aligned}$ (3) ).

Display Table

This experiment, we use sigmoid as an activation function and 130 hidden nodes. The results of all algorithms our Algorithm 1, IFB (Equation2(2) $\begin{aligned} y^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ ω^{k + 1} & = J_{τ^{k}}^{G} (y^{k} - τ^{k} F ω^{k}), k \geq 0, \end{aligned}$ (2) ) and IMFB (Equation3(3) $\begin{aligned} y^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ z^{k} & = y^{k} + α^{k} (ω^{k} - y^{k}), \\ ω^{k + 1} & = J_{τ^{k}}^{G} (z^{k} - τ^{k} F z^{k}), k \geq 0, \end{aligned}$ (3) ) are shown in Table .

From Table , we see that our Algorithm 1 when it's considered by the RLS $L_{1}$ and RLS $L_{1}$ -C $L_{2}$ ELM model gives the highest accuracy precision, recall, and F1-score efficiency, respectively. To show that our Algorithm is efficient without model over-fitting, we consider the following training and validation loss with the accuracy plots.

From Figures –, we see that training-validation loss, and accuracy plots are almost the same throughout, even though they oscillate, indicating that our models are well fitting. This means that our machine learning model adapts well to data similar to the data on which it was trained.

Figure 7. Training-validation loss plots of IMFB (Equation3(3) $\begin{aligned} y^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ z^{k} & = y^{k} + α^{k} (ω^{k} - y^{k}), \\ ω^{k + 1} & = J_{τ^{k}}^{G} (z^{k} - τ^{k} F z^{k}), k \geq 0, \end{aligned}$ (3) ) which is considered by the RLS $L_{1}$ ELM mode.

Figure 7. Training-validation loss plots of IMFB (Equation3(3) yk=ωk+σk(ωk−ωk−1),zk=yk+αk(ωk−yk),ωk+1=JτkG(zk−τkFzk),k≥0,(3) ) which is considered by the RLSL1 ELM mode.

Table 4. The performance by 4 evaluation matrices of our Algorithm 1, IFB (Equation2(2) $\begin{aligned} y^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ ω^{k + 1} & = J_{τ^{k}}^{G} (y^{k} - τ^{k} F ω^{k}), k \geq 0, \end{aligned}$ (2) ) and IMFB (Equation3(3) $\begin{aligned} y^{k} & = ω^{k} + σ^{k} (ω^{k} - ω^{k - 1}), \\ z^{k} & = y^{k} + α^{k} (ω^{k} - y^{k}), \\ ω^{k + 1} & = J_{τ^{k}}^{G} (z^{k} - τ^{k} F z^{k}), k \geq 0, \end{aligned}$ (3) ).

Display Table

5. Conclusion

In this paper, we study extreme learning machine and introduce a new modified projective forward-backward splitting algorithm to solve variational inclusion problems. We also prove weak convergence theorem under mind condition on the control stepsize. Parkinson's disease dataset from UCI machine learning repository was used for data training applying the proposed algorithm. The comparison with other machine learning models and existing algorithms show that our algorithm provides the highest performance value of 92.86% accuracy, 62.50% precision, 100% recall, and 76.92% F1-score considering on regularized least squares by $L_{1}$ (RLS $L_{1}$ ) and regularized least squares by $L_{1}$ on constrained set $C$ (RLS $L_{1}$ -C $L_{2}$ ). Moreover, considering training and validation loss, and the accuracy plots show that our algorithm has good fit model. Our future research is to develop a more relaxed condition of the inertial extrapolation parameter $σ^{k}$ . We also develop forward-backward splitting algorithms for multi-layer ELM (deep learning), which would be interesting to apply in machine learning.

Author's contributions

Writing – Original Draft and Software, W.C.; Review and Editing, S.D.. All authors have contributed to the development of each section of the paper and finally read and approved it for publication.

Disclosure statement

The authors have no competing interests to declare that are relevant to the content of this article.

Data availability statement

Parkinson's disease dataset is available on the UCI website (https://archive.ics.uci.edu/ml/datasets/parkinsons).

Additional information

Funding

This work is supported by National Research Council of Thailand and University of Phayao (N42A650334), and Thailand Science Research and Innovation, University of Phayao (FF67).

References

Bauschke HH, Borwein JM. On projection algorithms for solving convex feasibility problems. SIAM Rev. 1996;38:367–426. doi: 10.1137/S0036144593251710
Web of Science ®Google Scholar
Bauschke HH, Combettes PL. Convex analysis and monotone operator theory in Hilbert spaces. Berlin: Springer; 2011.
Google Scholar
Chen M, Li Y, Luo X, et al. A novel human activity recognition scheme for smart health using multilayer extreme learning machine. IEEE Internet Things J. 2019;6:1410–1418. doi: 10.1109/JIoT.6488907
Web of Science ®Google Scholar
Polyak B. Some methods of speeding up the convergence of iteration methods. USSR Comput Math Math Phys. 1964;4(5):1–17. doi: 10.1016/0041-5553(64)90137-5
Google Scholar
Bot RI, Csetnek ER. An inertial forward-backward-forward primal-dual splitting algorithm for solving monotone inclusion problems. Numer Algorithms. 2016;71(3):519–540. doi: 10.1007/s11075-015-0007-5
Web of Science ®Google Scholar
Hanjing A, Bussaban L, Suantai S. The modified viscosity approximation method with inertial technique and Forward–Backward algorithm for convex optimization model. Mathematics. 2022;10(7):1036. doi: 10.3390/math10071036
Web of Science ®Google Scholar
Inthakon W, Suantai S, Sarnmeta P, et al. A new machine learning algorithm based on optimization method for regression and classification problems. Mathematics. 2020;8(6):1007. doi: 10.3390/math8061007
Web of Science ®Google Scholar
Peeyada P, Dutta H, Shiangjen K, et al. A modified forward-backward splitting methods for the sum of two monotone operators with applications to breast cancer prediction. Math Methods Appl Sci. 2023;46(1):1251–1265. doi: 10.1002/mma.v46.1
Web of Science ®Google Scholar
Tan B, Cho SY. Strong convergence of inertial forward–backward methods for solving monotone inclusions. Appl Anal. 2022;101(15):5386–5414. doi: 10.1080/00036811.2021.1892080
Web of Science ®Google Scholar
Moudafi A, Oliny M. Convergence of a splitting inertial proximal method for monotone operators. J Comput Appl Math. 2003;155:447–454. doi: 10.1016/S0377-0427(02)00906-8
Web of Science ®Google Scholar
Peeyada P, Suparatulatorn R, Cholamjiak W. An inertial Mann forward-backward splitting algorithm of variational inclusion problems and its applications. Chaos Solit Fractals. 2022;158:112048. doi: 10.1016/j.chaos.2022.112048
Web of Science ®Google Scholar
Al-Dhaifallah M, Nisar KS, Agarwal P, et al. Modeling and identification of heat exchanger process using least squares support vector machines. Therm Sci. 2017;21(6 Part B):2859–2869. doi: 10.2298/TSCI151026204A
Web of Science ®Google Scholar
Baleanu D, Hasanabadi M, Vaziri AM, et al. A new intervention strategy for an HIV/AIDS transmission by a general fractional modeling and an optimal control approach. Chaos Solit Fractals. 2023;167:113078. doi: 10.1016/j.chaos.2022.113078
Web of Science ®Google Scholar
Baleanu D, Arshad S, Jajarmi A, et al. Dynamical behaviours and stability analysis of a generalized fractional model with a real case study. J Adv Res. 2023;48:157–173. doi: 10.1016/j.jare.2022.08.010
PubMed Web of Science ®Google Scholar
Boonsatit N, Rajchakit G, Sriraman R, et al. Finite-fixed-time synchronization of delayed Clifford-valued recurrent neural networks. Adv Differ Equ. 2021;2021(1):1–25. doi: 10.1186/s13662-021-03438-1
Web of Science ®Google Scholar
Rajchakit G, Agarwal P, Ramalingam S. Stability analysis of neural networks. Singapore: Springer; 2021.
Google Scholar
Elshewey AM, Shams MY, El-Rashidy N, et al. Bayesian optimization with support vector machine model for Parkinson disease classification. Sensors. 2023;23(4):2085. doi: 10.3390/s23042085
PubMed Web of Science ®Google Scholar
Chen HL, Wang G, Ma C, et al. An efficient hybrid kernel extreme learning machine approach for early diagnosis of Parkinson's disease. Neurocomputing. 2016;184:131–144. doi: 10.1016/j.neucom.2015.07.138
Web of Science ®Google Scholar
Bauschke HH, Combettes PL. Convex analysis and monotone operator theory in Hilbert spaces. New York: Springer; 2011. (CMS Books in Mathematics).
Google Scholar
Brézis H. Opérateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert. Amsterdam, Netherlands: North-Holland; 1973. (Math. Studies 5).
Google Scholar
Zhang J, Li Y, Xiao W, et al. Non-iterative and fast deep learning: multilayer extreme learning machines. J Franklin Inst B. 2020;357(13):8925–8955. doi: 10.1016/j.jfranklin.2020.04.033
Web of Science ®Google Scholar
López G, Martn-Mrquez V, Wang F, et al. Forward-backward splitting methods for accretive operators in Banach spaces. Abstr Appl Anal. 2012;2012:109236.
Google Scholar
Goebel K, Kirk WA. Topics in metric fixed point theory. Cambridge: Cambridge University Press; 1990.
Google Scholar
Auslender A, Teboulle M, Ben-Tiba S. A logarithmic-quadratic proximal method for variational inequalities. Boston, MA: Springer; 1999.
Google Scholar
Dua D, Graff C. UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science; 2019. Available at http://archive.ics.uci.edu/ml.
Google Scholar
Little M, Mcsharry P, Roberts S, et al. Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. Nat Preced. 2007;2007:1–1.
Google Scholar
Little M, McSharry P, Hunter E, et al. Suitability of dysphonia measurements for telemonitoring of Parkinson's disease. Nat Preced. 2008;2008:1–1.
Google Scholar
Cho H, Kim Y, Lee E, et al. Basic enhancement strategies when using Bayesian optimization for hyperparameter tuning of deep neural networks. IEEE Access. 2020;8:52588–52608. doi: 10.1109/Access.6287639
Web of Science ®Google Scholar
Jakkula V. Tutorial on support vector machine (SVM). Sch EECS Wash State Univ. 2006;37:3.
Google Scholar
Cutler A, Cutler DR, Stevens JR. Random forests. In: Zhang C, Ma Y, editors. Ensemble machine learning. Berlin, Germany: Springer; 2012. p. 157–175.
Google Scholar
Nick TG, Campbell KM. Logistic regression. Top Biostat. 2007;404:273–301. doi: 10.1007/978-1-59745-530-5
Google Scholar
Zhang H. The optimality of naive Bayes. Aa. 2004;1(2):3.
Google Scholar
Xingyu MA, Bolei MA, Qi F. Logistic regression and ridge classifier; 2022.
Google Scholar
Kotsiantis SB. Decision trees: a recent overview. Artif Intell Rev. 2013;39:261–283. doi: 10.1007/s10462-011-9272-4
Web of Science ®Google Scholar
Huang GB, Zhu QY, Siew CK. Extreme learning machine: a new learning scheme of feedforward neural networks. In 2004 IEEE international joint conference on neural networks, IEEE Cat. 2004;2(04CH37541):985–990.
Google Scholar

A modified projective forward-backward splitting algorithm for variational inclusion problems to predict Parkinson's disease

Abstract

1. Introduction

2. Preliminaries

[Citation19]

[Citation20]

[Citation21]

[Citation22]

[Citation23]

[Citation24]

[Citation19, Opial]

3. Optimization algorithm

4. Data classification problem

Table 1. The overview of PD dataset from UCI Machine Learning Repository.

Table 2. The accuracy comparison between Elshewey et al. [Citation17] methods and ours.

5. Conclusion

Author's contributions

Disclosure statement

Data availability statement

References

Information for

Open access

Opportunities

Help and information

A modified projective forward-backward splitting algorithm for variational inclusion problems to predict Parkinson's disease

Abstract

1. Introduction

2. Preliminaries

[Citation19]

[Citation20]

[Citation21]

[Citation22]

[Citation23]

[Citation24]

[Citation19, Opial]

3. Optimization algorithm

4. Data classification problem

Table 1. The overview of PD dataset from UCI Machine Learning Repository.

Table 2. The accuracy comparison between Elshewey et al. [Citation17] methods and ours.

Table 3. The necessary parameters for our Algorithm 1, IFB (Equation2(2) yk=ωk+σk(ωk−ωk−1),ωk+1=JτkG(yk−τkFωk),k≥0,(2) ) and IMFB (Equation3(3) yk=ωk+σk(ωk−ωk−1),zk=yk+αk(ωk−yk),ωk+1=JτkG(zk−τkFzk),k≥0,(3) ).

Table 4. The performance by 4 evaluation matrices of our Algorithm 1, IFB (Equation2(2) yk=ωk+σk(ωk−ωk−1),ωk+1=JτkG(yk−τkFωk),k≥0,(2) ) and IMFB (Equation3(3) yk=ωk+σk(ωk−ωk−1),zk=yk+αk(ωk−yk),ωk+1=JτkG(zk−τkFzk),k≥0,(3) ).

5. Conclusion

Author's contributions

Disclosure statement

Data availability statement

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date