Full article: Fuzzy model-based sparse clustering with multivariate t-mixtures

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Model-based clustering technique is an optimal choice for the distribution of data sets and to find the real structure using mixture of probability distributions. Many extensions of model-based clustering algorithms are available in the literature for getting most favorable results but still its challenging and important research objective for researchers. In the model-based clustering, many proposed methods are based on EM algorithm to overcome its sensitivity and initialization. However, these methods treat data points with feature (variable) components under equal importance, and so cannot distinguish the irrelevant feature components. In most of the cases, there exist some irrelevant features and outliers/noisy points in a data set, upsetting the performance of clustering algorithms. To overcome these issues, we propose a fuzzy model-based t-clustering algorithm using mixture of t-distribution with an $L_{1}$ regularization for the identification and selection of better features. In order to demonstrate its novelty and usefulness, we apply our algorithm on artificial and real data sets. We further used our proposed method on soil data set, which was collected in collaboration with and the assistance of Environmental laboratory Karakoram International University (GB) from various point/places of Gilgit Baltistan, Pakistan. The comparison results validate the novelty and superiority of our newly proposed method for both the simulated and real data sets as well as effectiveness in addressing the weaknesses of existing methods.

Introduction

The most common obstacle in machine learning and pattern recognition is to divide intrinsic structure of given data set into similar group, which is famously known as clustering (Jain and R, Citation1988; Mcnicholas, Citation2016). Cluster analysis, also known as unsupervised learning, is one of the most significant and successfully employed techniques that has noteworthy application in various areas such as wireless networking and Remote Sensing (Abbasi and Younis, Citation2007; Gogebakan and Erol, Citation2018), computational biology (Gogebakan, Citation2021; Yang and Ali, Citation2019), imaging processing (Chuang et al., Citation2006), soft computing (Gogebakan, Citation2021), data segmentation (Gogebakan and Hamza Citation2019), agriculture (Kadim and Wirnhardt, Citation2012), ecology (Rasool et al., Citation2016), data mining (Agrawal et al.,Citation2005) and economics (Garibaldi et al., Citation2006) etc. There are two major areas of clustering algorithms, namely, model-based clustering and nonparametric approach (McLachlan and Basford, Citation1988). For nonparametric approach, clustering methods are based on objective functions where K-Mean, Fuzzy C-mean and Possibilistic C mean are most common. In model-based clustering approach, we consider that the data points follow a mixture of probability distribution (Banfield and Raftery, Citation1993) where the EM (Expectation-Maximization) algorithm proposed by Dempster et al. (Citation1977) is the most common and famous approach using maximum-likelihood estimation for inferring mixture models (Biernacki and Jacques, Citation2013; Lee and Scott,Citation2012; Melnykov and Melnykov, Citation2012; Yang et al., Citation2012). A large number of algorithms have been proposed in model-based clustering, among them Yang and Ali (Citation2019), Banfield and Raftery (Citation1993), Yang, Chang-Chien, and Nataliani (Citation2019), Yang et al. (Citation2014), Fraley andRaftery (Citation2002), Lo and Gottardo (Citation2012) are most famous methods. Feature selection is not only the important technique in clustering but also challenging for researchers to get most relevant features. Due to the presence of irrelevant features in data sets, many complexities arise during clustering. Among those first is, clustering without relevant feature selection may fail to find the real structure of data and provide a minimum accuracy rate. Secondly, for high-dimensional data sets, clustering is computationally infeasible in the presence of irrelevant features. Thirdly, the presence of irrelevant features may also cause an appropriate model selection criterion problem. In addition, removing non-informative features may largely enhance interpretability (Pan and Shen, Citation2007; Xie et al., Citation2007). In this connection, Tibshirani (Citation1996) introduced the idea of Lasso regularization to cope up with sparsity in the context of regression analysis. Zadeh (Citation1965) presented the idea of fuzzy set which is useful in many areas.

In 2014, Yang et al. (Citation2014) have presented the idea of robust fuzzy classification maximum likelihood using multivariate t-distribution (FCML-T). Although this method is simple and applicable for noisy points and/or outliers in data sets but not applicable for irrelevant features selection. In 2019, Yang and Ali (Citation2019) have presented the idea of fuzzy Gaussian mixture model for feature selection using Lasso regularization but we are familiar that due to shorter tail of normal distribution in many cases, it is not an appropriate choice for clustering. Furthermore, it does not provide us robust results specially when the data sets have outliers or noisy points. To overcome these issues (due to outliers and/or noisy points), we extended the fuzzy classification maximum likelihood t-distribution using Lasso regularization and we called it F-MT-Lasso clustering algorithm. To show the novelty and usefulness of our proposed algorithm (F-MT-Lasso), we use simulated as well as real data sets and compare the performance of our proposed algorithm F-MT-Lasso with that of fuzzy model-based Gaussian clustering (F-MB-N) (Yang, Chang-Chien, and Nataliani Citation2019), FCML-T (Yang et al.,Citation2014) and Fuzzy Gaussian Lasso algorithm (Yang and Ali, Citation2019) algorithms. Results show the significance and upper hand of our proposed F-MT-Lasso algorithm. The rest of paper is organized as follows. In section 2, we discuss our proposed model fuzzy t-clustering Lasso algorithm. Section 3 elaborates the comparative analysis of our proposed method with some of existing schemas using simulated and real data sets. In section 4, we apply our algorithm on a real data set from field of biosciences. Section 5 details the application of our algorithm on real data set regarding soil which was collected from various placed of Gilgit-Baltistan, Pakistan in collaboration with of Karakoram International University Gilgit-Baltistan, Pakistan. We summarized our conclusions in section 6.

Fuzzy T-Distribution Lasso Clustering

Let a p-dimensional random variable X follows multivariate t-distribution with probability density function $f_{t} (x_{i}; μ_{k}, \sum_{k}, v_{k})$ . Where $μ_{k}$ , $\sum_{k}$ , and $v_{k}$ are mean, covariance and degree of freedom, respectively. The multivariate t-distribution is as follows:

$f_{t} (x_{i}; μ_{k}, \sum_{k}, v_{k}) = Γ \frac{v + p}{2} {(π v)}^{\frac{p}{2}} Γ \frac{v}{2} \sum \frac{1}{2} {1 + \frac{{(x - μ)}^{T} \sum^{- 1} (x - μ)}{v}}^{\frac{v + p}{2}}$ , where $(x - μ)^{T} \sum^{- 1} (x - μ)$ is mahalonobis square distance between data points $x$ and the mean $(μ)$ ,is the covariance matrix, and $Γ (v)$ is the Gamma function with $Γ = \int_{0}^{\infty} s^{v - 1} e^{- s} d s$ . In 1965 Zadeh (Citation1965) presented the idea of fuzzy set and Yang et al., (Citation2014) proposed the idea of fuzzy classification maximum likelihood clustering (FCML-T) and the objective function is as follows: $J (z, α, θ) = \sum_{i = 1}^{n} \sum_{k = 1}^{c} z_{k i}^{m} ln (f (x_{i}; θ) + w \sum_{i = 1}^{n} \sum_{k = 1}^{c} z_{k i}^{m} ln α_{k}$ . Here we consider $θ = \{μ_{k}, \sum_{k}, v_{k}\}$ . In the objective function $J (z, α, θ)$ , $m$ is fuzziness index, $m \in (1, \infty)$ and $w \geq 0$ are fixed constants and $α_{k}$ are mixing proportions and must satisfying $0 \leq α_{k} \leq 1$ while sum to one. We extend the fuzzy classification maximum likelihood proposed by Yang et al., (Citation2014) with multivariate t-distribution using Lasso penalty term using common diagonal variances. As we know that mixture of multivariate t-distribution is considered as a scale mixture of normal distributions. Suppose Y is latent variable then $x | y \sim N (x; μ, \sum^{/} y)$ with $Y \sim G (\frac{v_{k}}{2}, \frac{v_{k}}{2})$ ,where the gamma density function is defined as; $f (y; A, B) = \{B^{A} y^{A - 1} / Γ (A)\} exp (- B y) I_{(0, \infty)} (y);$ and $(A, B > 0) .$ So we can write the objective function as $J (z, α, θ) = \sum_{i = 1}^{n} \sum_{k = 1}^{c} z_{k i}^{m} ln [N (x; μ, \sum^{/} y_{k i}) G (y_{k i}; v_{k} / 2, v_{k} / 2)] + w \sum_{i = 1}^{n} \sum_{k = 1}^{c} z_{k i}^{m} ln α_{k}$ .

We further extend fuzzy classification maximum likelihood clustering algorithm proposed by Yang et al., (Citation2014) into a new method of multivariate t-distribution by adding the term $λ \sum_{k = 1}^{c} \sum_{p = 1}^{p} |μ_{k p}|$ . Thus, we propose a new F-MT-Lasso objective function as follows:

J_{F - M T - L a s s o} (z, α, θ) = \sum_{i = 1}^{n} \sum_{k = 1}^{c} z_{k i}^{m} ln [N (x; μ, \sum^{/} y_{k i}) G (y_{k i}; v_{k} / 2, v_{k} / 2)] + w \sum_{i = 1}^{n} \sum_{k = 1}^{c} z_{k i}^{m} ln α_{k} - λ \sum_{k = 1}^{c} \sum_{p = 1}^{p} |μ_{k p}|

where $λ \geq 0$ is tuning parameter that manage the amount of shrinkage and mean parameter. When the value of tuning parameter $λ$ is sufficiently large, some of the cluster centers $(μ_{k p})$ to be exactly zero and we discard the features when $μ_{k p} = 0$ .We use common diagonal covariance $Σ_{k} = \sum^{=} σ_{p}^{2}$ , and $w^{(t)} = {0.999}^{(t)}$ .To get the necessary conditions for minimizing the F-MT-Lasso objective function, we use the lagrangian as follows:

J_{F - M T - L a s s o} (z, α, θ) = \sum_{i = 1}^{n} \sum_{k = 1}^{c} z_{k i}^{m} ln f (x_{i}; θ) + w \sum_{i = 1}^{n} \sum_{k = 1}^{c} z_{k i}^{m} ln α_{k} - λ \sum_{k = 1}^{c} \sum_{p = 1}^{p} |μ_{k p}| - γ (\sum_{k = 1}^{c} z_{k i} - 1) - β (\sum_{k = 1}^{c} α_{k} - 1) .

The necessary conditions of $y_{k i}$ to maximize $J_{F - M T - L a s s o} (z, α, θ)$ are as follows:

(1)

y_{k i} = \frac{(v_{k} + d - 2)}{\{{(x_{i} - μ_{k})}^{T} \sum_{k}^{- 1} (x_{i} - μ_{k}) + v_{k}\}}

(1)

Differentiate $J_{F - M T - L a s s o} (z, α, θ)$ with respect to the fuzzy membership function, $z_{k i}$ , we obtain the updating equation for the membership function as follows:

(2)

{\hat{z}}_{k i} = \frac{{(ln f (x_{i}; θ) + w ln (α_{k}))}^{- \frac{1}{m - 1}}}{\sum_{s = 1}^{c} {(ln f (x_{i}; θ) + w ln (α_{k}))}^{- \frac{1}{m - 1}}}

(2)

Differentiate $J_{F - M T - L a s s o} (z, α, θ)$ w.r.t $α_{k}$ we obtain the value of mixing proportion ${\hat{α}}_{k}$

(3)

{\hat{α}}_{k} = \frac{\sum_{i = 1}^{n} z_{k i}^{m}}{\sum_{k = 1}^{c} \sum_{i = 1}^{n} z_{k i}^{m}}

(3)

For the degree of freedom, we differentiate $J_{F - M T - L a s s o} (z, α, θ)$ with respect to $v_{k}$ .We obtain the following equation:

(4)

ln (\frac{v_{k}}{2}) - ψ (\frac{v_{k}}{2}) + 1 + \frac{\sum_{i = 1}^{n} z_{k i}^{m} (ln y_{k i} - y_{k i})}{\sum_{i = 1}^{n} z_{k i}^{m}} = 0

(4)

where $ψ (u)$ is the digamma function $ψ (u) = \frac{d}{d u} ln Γ u$ , We used decreasing learning parameter $w$ as: $w^{(t)} = {0.999}^{t}$

(5)

w^{(t)} = {0.999}^{t}

(5)

To get the updating equation of $μ_{k p}$ we differentiate $J_{F - M T - L a s s o} (z, α, θ)$ with respect to $μ_{k p}$ we obtain the estimated value of ${\hat{μ}}_{k p}$

(6)

\begin{aligned} {\hat{μ}}_{k p} = \{\begin{matrix} {\tilde{μ}}_{k p} + \frac{λ {\hat{σ}}_{p}^{2}}{\sum_{i = 1}^{n} {\hat{z}}_{k i}^{m} y_{k i}}, i f {\tilde{μ}}_{k p} < - \frac{λ {\hat{σ}}_{p}^{2}}{\sum_{i = 1}^{n} {\hat{z}}_{k i}^{m} y_{k i}} \\ 0, i f |c| \leq \frac{λ {\hat{σ}}_{p}^{2}}{\sum_{i = 1}^{n} {\hat{z}}_{k i}^{m} y_{k i}} \\ {\tilde{μ}}_{k p} - \frac{λ {\hat{σ}}_{p}^{2}}{\sum_{i = 1}^{n} {\hat{z}}_{k i}^{m} y_{k i}}, i f {\tilde{μ}}_{k p} > \frac{λ {\hat{σ}}_{p}^{2}}{\sum_{i = 1}^{n} {\hat{z}}_{k i}^{m} y_{k i}} \end{matrix} \end{aligned}

(6)

With having

(7)

{\tilde{μ}}_{k p} = \frac{\sum_{i = 1}^{n} z_{k i}^{m} y_{k i} x_{i p}}{\sum_{i = 1}^{n} z_{k i}^{m} y_{k i}}

(7)

where ${\tilde{μ}}_{k p} = \sum_{i = 1}^{n} z_{k i}^{m} y_{k i} x_{i p} / \sum_{i = 1}^{n} z_{k i}^{m} y_{k i}$ is the maximum likelihood estimator (MLE) of the FCML-T clustering and $Σ_{k} = \sum = σ_{p}^{2}$ is common diagonal variance. When the value of $λ$ is sufficiently increase in Eq. (6), it should have some ${\hat{μ}}_{k p}$ = 0, otherwise it has the quantity $λ {\hat{σ}}_{p}^{2} / \sum_{i = 1}^{n} z_{k i}^{m} y_{k i}$ of shrinkage. Consequently, when we found, if $\tilde{μ} k p \leq λ {\hat{σ}}_{p}^{2} / \sum_{i = 1}^{n} z_{k i}^{m} y_{k i}$ , then we consider ${\hat{μ}}_{k p}$ = 0, and $p t h$ features supposed to be uninformative and discarded it from further clustering results; otherwise, cluster center will be ${\hat{μ}}_{k p}$ = ${\tilde{μ}}_{k p} - λ {\hat{σ}}_{p}^{2} / \sum_{i = 1}^{n} z_{k i}^{m} y_{k i}$ . To drive the updating equation of ${\hat{μ}}_{k p}$ , we use the F-MT-Lasso objective function $J_{F - M T - L a s s o} (z, α, θ)$ .Differentiate $J_{F - M T - L a s s o} (z, α, θ)$ with respect to $μ_{k p}$ ,we obtain the following form:

$\frac{\partial J_{F - M T - L a s s o} (z, α, θ)}{\partial μ_{k p}} = \frac{\sum_{i = 1}^{n} z_{k i}^{m} y_{k i} (x_{i} - μ_{k p})}{σ_{p}^{2}} - λ s i g n (μ_{k p})$ . Set $\frac{\partial J_{F - M T - L a s s o} (z, α, θ)}{\partial μ_{k p}}$ = 0, after simplification we obtain;

${\hat{μ}}_{k p} = {\tilde{μ}}_{k p} - \frac{λ σ_{p}^{2} s i g n ({\hat{μ}}_{k p})}{\sum_{i = 1}^{n} {\hat{z}}_{k i}^{(m)} y_{k i}}$ . In mathematics, we know that some functions are not necessarily differentiable so, $|{\hat{μ}}_{k p}|$ is not differentiable at ${\hat{μ}}_{k p}$ = 0. Sub-derivative is defined as a set of all sub-gradients of a convex function $f$ at $x$ is called the sub-differential of $f$ at $x$ . In order to solve this problem, we use sub derivative as a substitute for the derivative. Suppose we have the absolute value function $f (x) = |x|$ at x, is the $δ f (x) = s i g n (x),$ where sign function is defined as;

$s i g n (x) = \{\begin{matrix} - 1 i f x < 0 \\ [- 1, 1] i f x = 0 \\ \{+ 1\} i f x > 0 \end{matrix}$ . The absolute function $f (x) = |x|$ and its sub-differential $δ f (x) = s i g n (x)$ is shown in

Figure 1. Sub-differential of $δ f (x) = s i g n (x)$ .

Using this concept of sub-derivative or sub-gradient, we obtained the updating EquationEquation 7(7) ${\tilde{μ}}_{k p} = \frac{\sum_{i = 1}^{n} z_{k i}^{m} y_{k i} x_{i p}}{\sum_{i = 1}^{n} z_{k i}^{m} y_{k i}}$ (7) equation (6) for ${\hat{μ}}_{k p}$ . We have considered common diagonal covariance matrix which is suitable for high dimensional data sets and good choice for feature selection in our algorithm which is explain as follows:

Σ_{k} = \sum = σ_{p}^{2} = d i a g (σ_{p}) = (\begin{matrix} σ_{1}^{2} & 0 & 0 & \dots & 0 \\ 0 & σ_{2}^{2} & 0 & \dots & 0 \\ 0 & 0 & ⋱ & ⋮ \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & σ_{d}^{2} \end{matrix}), p = 1, \dots, d

we differentiate objective function $J_{F - M T - L a s s o} (z, α, θ)$ with respect to $σ_{p}^{2}, p = 1, \dots, d$ we get the updating equation of common diagonal covariance matrix.

(8)

{\hat{σ}}_{p}^{2} = \sum_{k = 1}^{c} \sum_{i = 1}^{n} z_{k i}^{m} y_{k i} (x_{i p} - μ k p) 2 \sum_{k = 1}^{c} \sum_{i = 1}^{n} z_{k i}^{m} y_{k i}

(8)

Thus, we have summarized our proposed F-MT-Lasso algorithm as follows:

Algorithm F-MT-Lasso clustering algorithm

Step 1: Fix $2 \leq c \leq n$ , $ε > 0$ and $m \in (1, \infty)$ . Give initials $w^{(0)}$ =1, $v^{(0)}$ $α_{k}^{(0)}$ , $μ_{k p}^{(0)}$ , $σ_{p}^{2, (0)}$ , $y_{k i}^{(0)}$ . $λ = 1$ and $t = 1$

Step 2: Compute ${\hat{z}}_{k i}^{(0)}$ with $w^{(0)}$ , $μ_{k p}^{(0)}$ , $y_{k i}^{(0)}$ , $α_{k}^{(0)}$ and $σ_{p}^{2, (0)}$ by Eq. (2)

Step 3: Compute ${\tilde{μ}}_{k p}^{(t)}$ with ${\hat{z}}_{k i}^{(t - 1)}$ and $y_{k i}^{(t - 1)}$ using Eq. (7).

Step 4: Compute $w^{(t)}$ using Eq. (5).

Step 5: Compute ${\hat{α}}_{k}^{(t)}$ with ${\hat{z}}_{k i}^{(t - 1)}$ busing Eq. (3).

Step 6: Compute ${\hat{σ}}_{p}^{(t) 2}$ with $μ k (t)$ , $y_{k i}^{(0)}$ and ${\hat{z}}_{k i}^{(t - 1)}$ by (8).

Step 7: Update ${\hat{z}}_{k i}^{(t)}$ with ${\tilde{μ}}_{k p}^{(t)}$ , $y_{k i}^{(t)}$ , ${\hat{α}}_{k}^{(t)}$ and ${\hat{σ}}_{p}^{(t) 2}$ using Eq. (2).

Step 8: Compute $v_{k}^{(t)}$ with ${\hat{z}}_{k i}^{(t)}$ and $y_{k i}^{(t - 1)}$ using Eq. (4).

Step 9: Compute $y_{k i}^{(t)}$ with ${\tilde{μ}}_{k p}^{(t)}$ , $v_{k}^{(t)}$ and ${\hat{σ}}_{p}^{(t) 2}$ using Eq. (1).

Step 10 : Update ${\tilde{μ}}_{k p}^{(t + 1)}$ with ${\hat{z}}_{k i}^{(t)}$ and $y_{k i}^{(t)}$ using Eq. (7).

If $max {\tilde{μ}}_{k}^{(t + 1)} - {\tilde{μ}}_{k}^{(t)} < ε$ ,stop .Else t=t+1 and return to step 3

Step 11: Update ${\hat{σ}}_{p}^{2, (t + 1)}$ with ${\tilde{μ}}_{k p}^{(t + 1)}$ , $y_{k i}^{(t)}$ and ${\hat{z}}_{k i}^{(t)}$ using Eq. (8)

Step 12: Update ${\hat{μ}}_{k p}^{(t)}$ with ${\hat{z}}_{k i}^{(t)}$ , $μ k p (t + 1)$ $y_{k i}^{(t)}$ and ${\hat{σ}}_{k p}^{2, (t + 1)}$ using Eq.(6), that is,

If $|{\tilde{μ}}_{k p}^{(t + 1)}| \leq \frac{λ {\hat{σ}}_{p}^{2, (t + 1)}}{\sum_{i = 1}^{n} {\hat{z}}_{k i}^{m (t)} y_{k i}},$ then let ${\hat{μ}}_{k p}^{(t)}$ =0.

Else ${\hat{μ}}_{k p}^{(t)} = {\tilde{μ}}_{k p}^{(t + 1)} - \frac{λ {\hat{σ}}_{p}^{2, (t + 1)}}{\sum_{i = 1}^{n} {\hat{z}}_{k i}^{m (t)} y_{k i}} .$

Step 13: Increase $λ$ and return to Step 3, or output results.

Numerical Comparisons

Here, we demonstrate the novelty of our proposed algorithm FMT-Lasso using synthetic and real data sets by using accuracy rate define as $A R = \sum_{j = 1}^{k} r_{j} / n$ where $r_{j}$ is the number of points in $C_{j}^{'}$ that are also in $C_{j}$ in which $C = \{C_{1}, C_{2}, \dots, C_{c}\}$ is the set of c clusters for the given data set and $C^{'} = \{C_{1}^{'}, C_{2}^{'}, \dots, C_{c}^{'}\}$ is the set of c clusters generated by the clustering algorithm. We compare our algorithm with F-MB-N (Yang et al. 2019b), FCML-T (Yang et al., Citation2014) and FG-Lasso (Yang and Ali Citation2019).The details of used datasets are presented in .

Table 1. Tabular repsentation of the synthetic and real data sets used.

Download CSV Display Table

Example 1.

In this example, a two-cluster data set with 1250 data points generated from a Gaussian mixture model $\sum_{k = 1}^{2} α_{k} N (u_{k}, \sum_{k})$ with the parameters $α_{k} = 1 / 2, \forall k,$ $u_{2} = {(\begin{matrix} 20 & 3 \end{matrix})}^{T}$ and $Σ_{1} = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})$ , $Σ_{2} = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})$ . Two features, namely, $\{x_{1}, x_{2}\}$ have been added with 350 noisy points and shown in . Since our objective is to identify relevant features, we extend the data set from two features $\{x_{1}, x_{2}\}$ up to three features $\{x_{1}, x_{2}, x_{3}\}$ by adding a 3^rd feature $x_{3}$ ,generated from uniform distribution over intervals [−2,2]. It implies that the third added feature $x_{3}$ ,is considered as irrelevant feature. We demonstrate F-MB-N, FCML-T, FG-Lasso and F-MT-Lasso by different initializations and record the average of 30 random initials. The clustering results of F-MB-N, FCML-T and FG-Lasso are shown in . The final result of our proposed method F-MT-Lasso has shown in . Due to having irrelevant feature $x_{3}$ with d = 3, clustering results from different methods are highly affected and shown poor average accuracy rates, as shown in . On the other hand, our proposed method F-MT-Lasso discard non-informative feature $x_{3}$ and provide best average accuracy rate (AR = 0.921). The details of discarded feature through FG-Lasso and F-MT-Lasso with different values of $λ$ are shown in . When we increase the value of $λ$ from 50 to 135, we observed that FG-Lasso discard important feature $x_{2}$ , ${\hat{μ}}_{12} = {\hat{μ}}_{22} = 0$ and ${\hat{μ}}_{13} = 0$ becomes zero. Similarly, when the value of $λ$ is increasing up to 135, we observed that proposed method F-MT-Lasso discard ${\hat{μ}}_{13} = 0$ and ${\hat{μ}}_{23} = 0$ while FG-Lasso discards important component ${\hat{μ}}_{21} = 0$ . It is clearly seen that F-MT-Lasso works better and discarded third irrelevant feature $x_{3}$ .After discarding irrelevant feature $x_{3}$ F-MT-Lasso shows best results, this shows the novelty of our method.

Figure 2. (a) the original 2-cluster Gaussian data set; (b) F-MB-N clustering results; (c) FCML-T clustering results; (d) FG-Lasso clustering results; (f) F-MT-Lasso clustering results.

Table 2. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Download CSV Display Table

Table 3. Feature reduction pattern based on $λ$ values.

Display Table

Example 2.

In this example, we consider a data set consists of 3 clusters with 950 data points generated from the Gaussian mixture (GM) distribution $\sum_{k = 1}^{3} α_{k} N (u_{k}, \sum_{k})$ having parameters $α_{k} = 1 / 3, \forall k,$ $u_{1} = {(\begin{matrix} 4 & 6 \end{matrix})}^{T},$ $Σ_{1} = (\begin{matrix} 3 & 0 \\ 0 & 1 \end{matrix})$ , $Σ_{2} = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})$ and $Σ_{3} = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})$ , with two feature $\{x_{1}, x_{2}\}$ . We added 100 noisy points to features $\{x_{1}, x_{2}\}$ using uniform distribution over the intervals [−5,5] and [0, 1], and the sample size will be 1050 points. Result is shown in . Since our objective is to identify relevant features, we extended the data set from 2 features $\{x_{1}, x_{2}\}$ up to four features $\{x_{1}, x_{2}, x_{3}, x_{4}\}$ by adding two additional features $x_{3}$ and $x_{4}$ , both have been generated from uniform distribution over intervals [−1,1] and [−5, 5],it implies that the third and four added features $x_{3}$ and $x_{4}$ , are considered to be irrelevant features. The 3-D plots of $\{x_{1}, x_{2}, x_{3}\}$ and $\{x_{1}, x_{2}, x_{4}\}$ have shown in . We demonstrate F-MB-N, FCML-T, FG-Lasso, and F-MT-Lasso under different initializations and record the average of 30 random initials.The clustering results of F-MB-N, FCML-T and FG-Lasso are shown in . The final result of our proposed method F-MT-Lasso has shown in . Due to having irrelevant feature $x_{3}$ and $x_{4}$ with d = 4, clustering results from different methods have been highly affected and shown poor average accuracy rates, as shown in . While our proposed method F-MT-Lasso discard non-informative features $x_{3}$ and $x_{4}$ ,as a result it provides us best average accuracy rate (AR = 0.989). The details of discarded features through FG-Lasso and F-MT-Lasso for different values of $λ$ are shown in . When we increase the value of $λ$ to 30, we observed that both Algorithms completely discarded irrelevant feature $x_{3}$ . Similarly, when the value of $λ$ is increased from 60 to 111 another irrelevant feature $x_{4}$ also discarded by both methods and their results have been shown in . It is clearly seen that after discarding irrelevant features $x_{3}$ and $x_{4}$ our proposed algorithm shows best results.

Figure 3. (a) the original 3-cluster Gaussian data set; (b) 3-D plot representation $x_{1}, x_{2} a n d x_{3}$ ; (c) 3-D plot representation $x_{1}, x_{2} a n d x_{4}$ ; (d)f-MB-N clustering results; (e)fcml-T clustering results; (f)fg-Lasso clustering results; (g)f-MT-Lasso clustering results.

Table 4. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Download CSV Display Table

Table 5. Feature reduction pattern based on $λ$ values.

Display Table

Example 3.

In this example, we consider a data set consists of five clusters with 400 data points generated from the Gaussian mixture (GM) distribution $\sum_{k = 1}^{5} α_{k} N (u_{k}, \sum_{k})$ having parameters $α_{k} = 1 / 5, \forall k,$ $u_{1} = {(\begin{matrix} 4 & 6 \end{matrix})}^{T},$ $u_{5} = {(10 8)}^{T},$

Σ_{1} = \sum_{2} = \sum_{3} = \sum_{4} = \sum_{5} = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})

.Two features are

\{x_{1}, x_{2}\}

and added 400 noisy points to features

x_{3}

by using uniform distribution over the intervals [−10,10].Since our objective is to identify relevant features, we extended the data set from the 2 features

\{x_{1}, x_{2}\}

up to 3 features

\{x_{1}, x_{2}, x_{3}\}

by adding one feature

x_{3}

, generated from uniform distribution over intervals [−10,10]. It implies that the third added feature

x_{3}

, is an irrelevant feature. The 3-D plots of

\{x_{1}, x_{2}, x_{3}\}

have shown in . We demonstrated F-MB-N, FCML-T, FG-Lasso, and F-MT-Lasso under different initializations and record the average of 30 random initials.The clustering results of F-MB-N, FCML-T and FG-Lasso are shown in . The final result of our proposed method F-MT-Lasso has been shown in . Due to the irrelevant feature

x_{3}

with d = 3, clustering results of different methods are highly affected and shown poor average accuracies, as shown

Figure 4. (a) 3-D plot representation $x_{1}, x_{2} a n d x_{3}$ ; (b) F-MB-N clustering results; (c) FCML-T clustering results (d) FG-Lasso clustering results;(e) F-MT-Lasso clustering results.

in . However proposed method F-MT-Lasso discard non-informative feature $x_{3}$ and as a results, it provided us with best average accuracy (AR = 1.00). The details of discarded feature through FG-Lasso and F-MT-Lasso against different values of $λ$ are shown in . When we increase the value of $λ$ to 50, we observed that both Algorithms completely discarded irrelevant feature $x_{3}$ and the obtained results have been shown in . It is clearly seen that after discarding irrelevant feature $x_{3}$ our proposed algorithm shows best results, that is the advantage and novelty of our method.

Table 6. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Download CSV Display Table

Table 7. Feature reduction pattern based on $λ$ values.

Display Table

Application in the Field of Biosciences

Variable selection and dealing with outliers/noisy points in biological studies are challenging and important task. Due to having outliers and irrelevant features/genes in biological data sets, estimated parameters would be biased, insufficient and inconsistent. In order to demonstrate the effectiveness and real applicability of proposed method F-MT-Lasso we applied it in the following five sets of biological real data; seeds, Pima Indian, prostate cancer, breast cancer and soil data set. Soil data set have been collected from Gilgit-Baltistan, Pakistan in collaboration with Karakoram International University GB, Pakistan. Comparisons of the proposed F-MT-Lasso algorithm with F-MB-N, FCML-T and FG-Lasso have also been made in the following.

Example 4.

In this example, we consider the real data set of seeds from (Das, Citation2014). This data set consists of 7 real-valued continuous attributes, namely; area, perimeter, compactness, length of kernel, width of kernel, asymmetry coefficient and length of kernel groove. This data set comprised of three different varieties of wheat and Samples were labeled by numbers: 1–70 first variety of wheat “Kama wheat variety,” 71–140 for the “Rosa wheat variety,” and 141–210 for the “Canadian wheat variety.” 70 elements each, randomly selected for the experiment. To collect this data set, high quality visualization of the internal kernel structure was detected using a soft x-ray technique. This sort of technology is very familiar and famous because it is nondestructive and considerably cheaper. When the proposed F-MT-Lasso algorithm is applied to the data set, F-MT-Lasso and FG-Lasso both methods identified that 6^th feature is irrelevant one among a total of seven features. When the $λ$ value is increased up to 162, according to FG-Lasso we get ${\hat{μ}}_{16} = {\hat{μ}}_{26} = {\hat{μ}}_{36} = 0$ and consider features six as irrelevant and removed it from further clustering. After discarding this irrelevant feature, the average accuracy rate with 30 different initializations, we obtain (AR = 0.859) from FG-Lasso, (AR = 0.593) from F-MB-N, and (AR = 0.628) using FCML-T. When we increase the value of $λ$ = 200 we observed that ${\hat{μ}}_{16} = {\hat{μ}}_{26} = {\hat{μ}}_{36} = 0$ . Our proposed F-MT-Lasso algorithm discards feature six “asymmetry coefficient.” After discarding this feature from data, we execute our proposed F-MT-Lasso algorithm and get better average accuracy (AR = 0.891) with 30 different initializations. The comprision of each average accuracy rate has been shown is and graphical comprision have been shown in . This reveals that, the proposed F-MT-Lasso algorithm is significant and effective for relevant feature selection on the seeds data set.

Table 8. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Download CSV Display Table

Figure 5. Box and whisker plot of average accuracies from different methods.

Example 5.

In this example, we consider the real data set of Pima Indian (Citationundefined). This data set consists of 8 predict variables and one response variable. The variables are named as pregnant, plasma, blood pressure (mm Hg), triceps skin fold thickness (mm), insulin (mu U/ml), body mass index (weight in kg/(height in m)^2), diabetes pedigree function, and age (years). While response variable is (1: diabetes, 0: not). The data set has two classes. Diabetes mellitus is very common and severe disease in many populations of the world including American Indian tribe and Indian. There are many risk factors of this disease and some well-known of those are parental diabetes, genetic markers, obesity, diet (Das, Citation2014). When the proposed FG-Lasso algorithm is applied to the Pima Indian diabetics data set, it identified that when we are increasing the value of $λ$ up to 50 using FG-Lasso, we get ${\hat{μ}}_{15} = {\hat{μ}}_{25} = 0$ , feature five “insulin” as irrelevant feature and after removing it, from further clustering we obtain (AR = 0.653) from FG-Lasso, (AR = 0.544) from F-MB-N, (AR = 0.5083) using FCML-T with the average of 30 different initializations. Our proposed method F-MT-Lasso also discards feature five “insulin” against the value of $λ$ = 150 and we get better average accuracy rate (AR = 0.720) with 30 different initializations. The comprision of each average accuracy rate has been shown is , while graphical comprision have been reflected in . This confirms that, the proposed F-MT-Lasso algorithm is also significant for relevant feature selection on the Pima Indian data set.

Table 9. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Download CSV Display Table

Figure 6. Box and whisker plot of average accuracies from different methods.

Example 6

(Breast Cancer (UCI, Citation2019)) Breast cancer is one of the severe and commonest cause of death in women worldwide. It is frequently found in Australia/New-Zealand, United Kingdom, Sweden Finland, Denmark, Belgium (Highest rate), the Netherlands and France. According to the findings of World health organization, common causes of breast cancer are tobacco use, use of alcohol, dietary factors including lack of fruit and vegetable consumption, overweight,obesity, physical inactivity, chronic infections from helicobacter pylori, hepatitis B virus, hepatitis C virus and some type of human papilloma virus, environmental and occupational risks including ionizing and non-ionizing radiation (Bray et al., Citation2018; Siegel et al., Citation2019). In this example, we consider real data set regarding breast cancer that consist of eight features namely; clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, bare nuclei, bland chromatin, normal nucleoli, mitoses and one output variable class having 699 samples. When the proposed FG-Lasso algorithm is applied on this breast cancer data set, it has been observed that when we increase the value of $λ$ to 400, four features namely; clump thickness, marginal adhesion, normal nucleoli and mitoses are identified to be irrelevant features. So, after removing these four features from further clustering, we obtained (AR = 0.911) from FG-Lasso, (AR = 0.892) from F-MB-N, and (AR = 0.850) using FCML-T on the average, for 30 different initializations. On the other hand, our proposed method F-MT-Lasso discards only feature seven “normal nucleoli” against the same value of $λ$ = 400 and we get even more better average accuracy (AR = 0.962) with 30 different initializations. The comprision of each average accuracies are shown is and graphical representation is shown in . This reveals that, the proposed F-MT-Lasso algorithm is more significant and effective for relevant feature selection regarding breast cancer data set.

Table 10. Comparison of F-MT-Lasso with F-MB-N, FCML-T and FG-Lasso.

Download CSV Display Table

Figure 7. Box and whisker plot of average accuracies from different methods.

Example 7

(Prostate cancer Saifi, Citation2018)) Prostate cancer is second major common type of cancer and fifth leading cause of death among men worldwide and occurs over the age of 70 years (Bray et al., Citation2018).This kind of cancer starts, when cells in the prostate gland start to grow out of control. The most leading countries in this domain are Australia, America, New Zealand, Norway, Sweden and Ireland (Bray et al., Citation2018). Here we consider a real prostate cancer data set consists of 100 patients of prostate cancer having eight features namely; radius, texture, perimeter, area, smoothness, compactness, symmetry, fractal dimension and one categorical parameter diagnosis results (benign tumors = 38 and malignant tumors = 68). When FG- Lasso algorithm is applied on the prostate cancer data set, this identified that when we are increasing the value of $λ$ up to 60, we observed the features like radius, texture, perimeter, and area as irrelevant features. After removing these irrelevant features, we obtain (AR = 0.617) from FG-Lasso, (AR = 0.517) from F-MB-N, and (AR = 0.635) using FCML-T after taking the average of 30 different initializations. When the proposed F-MT-Lasso algorithm is applied to the prostate cancer data set, it has been noticed that 4^th feature “area” became irrelevant feature against $λ$ = 78, and consequently has been discarded. Hence we found that after discardng it, we get better average accuracy rate (AR = 0.807) with 30 different initializations. The comprision of each average accuracy rate, are shown is and graphical comparisons are shown in . This shows that, the proposed F-MT-Lasso algorithm is more significant and effective for relevant feature selection of the prostate cancer data set.

Table 11. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Download CSV Display Table

Figure 8. Box and whisker plot of average accuracies from different methods.

A Real Application of Soil Data from the Region of Gilgit-Baltistan, Pakistan

Finally, we apply our proposed F-MT-Lasso algorithm on real data set regarding soil which consists of thirty samples, ten samples in each cluster. The soil samples have randomly been taken from 0 to 15 cm depth with the help of small spade and hand trowel from three region (clusters) of Gilgit-Baltistan namely; Damote Sai (located in Hindukush range), Bunji (located in Himalya) and Jalalabad (located in Karakorum Range) with the collaboration of Karakoram international university Giglit-Baltistan. The purpose of taking samples from three different locations is to compare soil fertility status of regions. The samples have been dried and Sieved through a 2 mm sieve for further laboratory investigation. PH was measured through a pH probe by 1:1 (soil: water) suspension with OAKTON PC 700 meter (Mclean, Citation1983). EC was measured by 1:5 (Soil: water) with Milwaukec EC meter (SM 302) (Rayment and Higginson, Citation1992). Fertility status of soil NO3-N, P, K was determined by (AB- DTPA) extractable method (Jones, Citation2001). In all the samples of three regions, Nitrogen was detected as defecient or low range and our both methods FG-Gauss and F-MT-Lasso have suggested to discard the Nitrogen from soil data to improve the accuracy as shown in . Hussain et al., (Citation2021) conducted research on soil fertility of two villages from lower Karakorum Range and the quantity of nitrogen was in range within marginal or medium range from both orchard and agricultural land. Whereas Babar et al. (Citation2004) stated/reported the deficiency of nitrogen (0.08% only) in the soil of Gilgit region.

Table 12. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Download CSV Display Table

In the following we show scatter plots for all possible combinations of soil parameters in while graphical comprision have shown in .

Figure 9. Scatter plots for all possible combinations of PH, EC, N, P and K.

Figure 10. Box and whisker plot of average accuracies from different methods.

Conclusion

In model-based clustering many proposed methods are based on EM algorithm to overcome its sensitivity and initialization issues. However, these methods treat data points with feature (variable) components with equal importance, and so it cannot distinguish the irrelevant feature components. In most of the cases, there exist some irrelevant features and outliers/noisy points in a data set that adversely affect the performance of clustering algorithms. To identify and discard those irrelevant features or to handle the problems due to those outliers/noisy points, multivariate t-distribution is more efficient and effective than multivariate normal distribution due to its heavily. It is therefore we proposed a fuzzy model-based t-clustering schema using mixture of t-distribution with an $L_{1}$ regularization for the better identification and selection of significant features and to improve the performance of algorithm against the sparsity exists in the data.

We have applied our proposed F-MT-Lasso algorithm on simulatd data sets as well as real data sets including seeds, pima, prostrate cancer, breast cancer and soil data data to show its effectivenss and usefulness.It has been seen from comparative analysis that the proposed F-MT-Lasso algorithm is a robust choice and provides better results with higher accuracy rates as compared to the existing methods for variouse larger values of threshold $λ$ . However, our question is, which value of the threshold $λ$ would be the optimal value for better feature selection in the F-MT-Lasso algorithm? That is, to find a good estimate for the threshold parameter $λ$ is very important and would be our further topic in our future research.

Disclosure statement

We have no conflicts of interest to disclose.

References

Abbasi, A., and M. Younis. 2007. A survey on clustering algorithms for wireless sensor networks. Computer Communications 30 (14–15):2826–492. doi:10.1016/j.comcom.2007.05.024.
Google Scholar
Agrawal, R., J. Gehrke, D. Gunopulos, and P. Raghavan. 2005. Automatic subspace clustering of high dimensional data for data mining applications. Data Mining and Knowledge Discovery 11 (1):5–33. doi:10.1007/s10618-005-1396-1.
Web of Science ®Google Scholar
Babar, K., R. A. Khattak, and A. Hakeem. 2004. Physico-chemical characteristics and fertility status of Gilgit soils. Journal of Agricultural Research 42 (3–4):305–12.
Google Scholar
Banfield, J. D., and A. E. Raftery. 1993. Model-based Gaussian and non-Gaussian clustering. Biometrics 49 (3):803–21. doi:10.2307/2532201.
Web of Science ®Google Scholar
Biernacki, C., and J. Jacques. 2013. A generative model for rank data based on insertion sort algorithm. Computational Statistics & Data Analysis 58:162–76. doi:10.1016/j.csda.2012.08.008.
Web of Science ®Google Scholar
Bray, F., J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal. 2018. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians 68 (6):394–424. doi:10.3322/caac.21492.
PubMed Web of Science ®Google Scholar
Chuang, K. S., H. L. Tzeng, S. Chen, et al. 2006. Fuzzy c-means clustering with spatial information for image segmentation. Computerized Medical Imaging and Graphics. 30(1):9–15. doi:10.1016/j.compmedimag.2005.10.001.
PubMed Web of Science ®Google Scholar
Das, R. N. 2014. Determinants of diabetes mellitus in the pima indian mothers and Indian medical students. The Open Diabetes Journal 7 (1):5–13. doi:10.2174/1876524601407010005.
Google Scholar
Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological) 39 (1):1–22. doi:https://doi.org/10.1111/j.2517-6161.1977.tb01600.x.
Web of Science ®Google Scholar
Fraley, C., and A. E. Raftery. 2002. Model-based clustering, discriminant analysis, and density estimation. doi:10.1198/016214502760047131.
Google Scholar
Garibaldi, U., D. Costantini, S. Donadio, et al. 2006. Herding and clustering in economics: The Yule-Zipf-Simon model. Computational Economics. 27(1):115–34. doi:10.1007/s10614-005-9018-y.
Google Scholar
Gogebakan, M. 2021. A novel approach for Gaussian mixture model clustering based on soft computing method. IEEE Access 9:159987–60003. doi:10.1109/ACCESS.2021.3130066.
Web of Science ®Google Scholar
Gogebakan, M., and H. Erol. 2018. A new semi-supervised classification method based on mixture model clustering for classification of multispectral data. Journal of the Indian Society of Remote Sensing 46 (8):1323–31. doi:10.1007/s12524-018-0808-9.
Web of Science ®Google Scholar
Gogebakan, M., and E. Hamza. 2019. Mixture model clustering using variable data segmentation and model selection: A case study of genetic algorithm.Mathematics letters. Mathematics Letters 5 (2):23–32. doi:10.11648/j.ml.20190502.12.
Google Scholar
Hussain, A., H. Ali, F. Begum, A. Hussain, M. Khan, Y. Guan, J. Zhou, S. Din, and K. Hussain. 2021. Mapping of soil properties under different land uses in lesser karakoram range, Pakistan. Polish Journal of Environmental Studies 30 (2):1181–89. doi:10.15244/pjoes/122443.
Web of Science ®Google Scholar
Jain, A. K., and C. R. 1988. Dubes: Algorithms for clustering data. New Jersey: Prentice Hall.
Google Scholar
Jones, J. B. 2001. Laboratory guide for conducting soil tests and plant analysis (No. BOOK). CRC press.
Google Scholar
Kadim, T., and C. Wirnhardt. 2012. Neural network-based clustering for agriculture management. EURASIP Journal on Advances in Signal Processing 2012 (1):1–13. doi:10.1186/1687-6180-2012-200.
Google Scholar
Lee, G., and C. Scott. 2012. EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Computational Statistics & Data Analysis 56 (9):2816–29. doi:https://doi.org/10.1016/j.csda.2012.03.003.
Web of Science ®Google Scholar
Lo, K., and R. Gottardo. 2012. Flexible mixture modeling via the multivariate t distribution with the box-cox transformation: An alternative to the skew-t distribution. Statistics and Omputing 22 (1):33–52. doi:10.1007/s11222-010-9204-1.
PubMed Web of Science ®Google Scholar
McLachlan, G. J., and K. E. Basford. 1988. Mixture models: Inference and applications to clustering. vol. 38 New York: M. Dekker.
Google Scholar
McLean, E. O. 1983. Soil pH and lime requirement. Methods of soil analysis: Part 2 chemical and microbiological properties. 9:199–224.
Google Scholar
Mcnicholas, P. D. 2016. Model-based clustering. Journal of Classification 33 (3):331–73. doi:10.1007/s00357-016-9211-9.
Web of Science ®Google Scholar
Melnykov, V., and I. Melnykov. 2012. Initializing the em algorithm in Gaussian mixture models with an unknown number of components. Computational Statistics & Data Analysis 56 (6):1381–95. doi:https://doi.org/10.1016/j.csda.2011.11.002.
Web of Science ®Google Scholar
Pan, W., and X. Shen. 2007. Penalized model-based clustering with application to variable selection. Journal of Machine Learning Research 8 (5).
Google Scholar
Rasool, A., X. Tangfu, F. Farooqi, et al. 2016. Arsenic and heavy metal contaminations in the tube well water of Punjab, Pakistan and risk assessment: A case study. Ecological engineering 95:90–100. doi:10.1016/j.ecoleng.2016.06.034.
Web of Science ®Google Scholar
Rayment, G. E., and F. R. Higginson. 1992. Australian laboratory handbook of soil and water chemical methods. Inkata Press Pty Ltd.
Google Scholar
Saifi, S. Prostate cancer dataset. https://www.kaggle.com/sajidsaifi/prostate-cancer
Google Scholar
Siegel, R. L., K. D. Miller, and A. Jemal. 2019. Cancer statistics, 2019. CA: A Cancer Journal for Clinicians 69 (1):7–34. doi:https://doi.org/10.3322/caac.21551.
PubMed Web of Science ®Google Scholar
Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Methodological) 58 (1):267–88. doi:10.1111/j.2517-6161.1996.tb02080.x.
Web of Science ®Google Scholar
UCI Machine Learning Repository. 2019. World health statistics. Geneva. https://archive.ics.uci.edu/ml/index.php
Google Scholar
Xie, B., W. Pan, and X. Shen. 2007. Variable selection in penalized model-based clustering via regularization on grouped parameters. Biometrics 64 (3):921–30. doi:10.1111/j.1541-0420.2007.00955.x.
PubMed Web of Science ®Google Scholar
Yang, M. S., and W. Ali. 2019. Fuzzy Gaussian Lasso clustering with application to cancer data. Mathematical Biosciences and Engineering 17 (1):250–65. doi:10.3934/mbe.2020014.
PubMed Web of Science ®Google Scholar
Yang, M. S., Y. C. T. And, and Y. C. Lin. 2014. Robust fuzzy classification maximum likelihood clustering with multivariate t-distributions. International Journal of Fuzzy Systems 16:566–76.
Web of Science ®Google Scholar
Yang, M. S., S. J. Chang-Chien, and Y. Nataliani. 2019. Unsupervised fuzzy model-based Gaussian clustering. Information Sciences 481:1–23. doi:10.1016/j.ins.2018.12.059.
Web of Science ®Google Scholar
Yang, M. S., C. Y. Lai, and C. Y. Lin. 2012. A robust em clustering algorithm for Gaussian mixture models. Pattern recognition 45 (11):3950–61. doi:10.1016/j.patcog.2012.04.031.
Web of Science ®Google Scholar
Zadeh, L. A. 1965. Fuzzy sets. Information and Control 8 (3):338–53. doi:https://doi.org/10.1016/S0019-9958(65)90241-X.
Google Scholar

Fuzzy model-based sparse clustering with multivariate t-mixtures

ABSTRACT

Introduction

Fuzzy T-Distribution Lasso Clustering

Numerical Comparisons

Table 1. Tabular repsentation of the synthetic and real data sets used.

Table 2. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Table 3. Feature reduction pattern based on $λ$ values.

Table 4. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Table 5. Feature reduction pattern based on $λ$ values.

Table 6. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Table 7. Feature reduction pattern based on $λ$ values.

Application in the Field of Biosciences

Table 8. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Table 9. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Table 10. Comparison of F-MT-Lasso with F-MB-N, FCML-T and FG-Lasso.

Table 11. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

A Real Application of Soil Data from the Region of Gilgit-Baltistan, Pakistan

Table 12. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Conclusion

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

Fuzzy model-based sparse clustering with multivariate t-mixtures

ABSTRACT

Introduction

Fuzzy T-Distribution Lasso Clustering

Numerical Comparisons

Table 1. Tabular repsentation of the synthetic and real data sets used.

Table 2. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Table 3. Feature reduction pattern based on λ values.

Table 4. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Table 5. Feature reduction pattern based on λ values.

Table 6. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Table 7. Feature reduction pattern based on λ values.

Application in the Field of Biosciences

Table 8. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Table 9. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Table 10. Comparison of F-MT-Lasso with F-MB-N, FCML-T and FG-Lasso.

Table 11. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

A Real Application of Soil Data from the Region of Gilgit-Baltistan, Pakistan

Table 12. Comparison of F-MB-N, FCML-T, FG-Lasso with F-MT-Lasso based on reduced feature and average AR.

Conclusion

Disclosure statement

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 3. Feature reduction pattern based on $λ$ values.

Table 5. Feature reduction pattern based on $λ$ values.

Table 7. Feature reduction pattern based on $λ$ values.