Search in:

Statistical Theory and Related Fields Volume 4, 2020 - Issue 2

Submit an article Journal homepage

Free access

421

Views

CrossRef citations to date

Altmetric

Listen

Short Communications

Empirical likelihood estimation in multivariate mixture models with repeated measurements

Yuejiao Fua Department of Mathematics and Statistics, York University, Toronto, Canada

https://orcid.org/0000-0001-8606-570X View further author information

Yukun Liub School of Statistics, East China Normal University, Shanghai, ChinaCorrespondence[email protected]
View further author information

Hsiao-Hsuan Wanga Department of Mathematics and Statistics, York University, Toronto, CanadaView further author information

Xiaogang Wanga Department of Mathematics and Statistics, York University, Toronto, Canada;c Institute of Data Science, Tsinghua University, Beijing, ChinaView further author information

Pages 152-160 | Received 12 Nov 2018, Accepted 07 Jun 2019, Published online: 19 Jun 2019

Cite this article
https://doi.org/10.1080/24754269.2019.1630544
CrossMark

In this article

1. Introduction
2. Methodology
3. Simulation studies and data analysis
4. Discussions
Acknowledgements
Disclosure statement
Additional information
References
Appendixes

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Multivariate mixtures are encountered in situations where the data are repeated or clustered measurements in the presence of heterogeneity among the observations with unknown proportions. In such situations, the main interest may be not only in estimating the component parameters, but also in obtaining reliable estimates of the mixing proportions. In this paper, we propose an empirical likelihood approach combined with a novel dimension reduction procedure for estimating parameters of a two-component multivariate mixture model. The performance of the new method is compared to fully parametric as well as almost nonparametric methods used in the literature.

Keywords:

Empirical likelihood
estimating equation
repeated measurements
multivariate mixture model

1. Introduction

Mixture models provide a flexible way of modelling complex data obtained from a population with observed or unobserved heterogeneity. Mixture models have been applied in astronomy, biology, fishery, human genetics, and other scientific areas of research. See Titterington, Smith, and Makov (Citation1985), Lindsay (Citation1995), McLachlan and Peel (Citation2000), and references therein.

We consider a special multivariate mixture model where repeated measurements are available for each subject. Let $X_{1}, \dots, X_{n}$ be independent and identically distributed (i.i.d.) d-variate random vectors from a finite mixture model with m components. If the elements of the vector $X_{i}$ are independent conditional on belonging to a subpopulation, then the mixture density is given by (1) $h (x) = \sum_{j = 1}^{m} π_{j} \prod_{r = 1}^{d} f_{j r} (x_{r}),$ (1) where $π_{j}$ 's are mixing proportions such that $\sum_{j = 1}^{m} π_{j} = 1$ , $π_{j} > 0$ for all j, and $f (\cdot)$ , with or without subscripts, denotes a univariate density function.

The above data structure is quite common especially in social sciences where measurements are taken repeatedly for various reasons. For example, the goal of research on preschool children's inclusion task responses is to study different solution strategies with which young children solve a given cognitive task. The solution strategy is often called the latent variable since it is hidden and unobservable. A group of preschool children can be considered as a sample from a mixture model where the components correspond to the various solution strategies; see Thomas and Horton (Citation1997). In a simplified setting, one could assume that there are two main solution strategies which lead to a mixture model with two components.

Many researchers studied the nonparametric identifiability of the above multivariate mixture model. Hall and Zhou (Citation2003) showed that the model (Equation1(1) $h (x) = \sum_{j = 1}^{m} π_{j} \prod_{r = 1}^{d} f_{j r} (x_{r}),$ (1) ) is always nonparametrically unidentifiable when d=2 and m=2. Under some mild regularity conditions, Hall and Zhou (Citation2003) proved that the two-component mixture model is nonparametrically identifiable for $d \geq 3$ . Kasahara and Shimotsu (Citation2014) discussed the identifiability of the number of components in multivariate mixture models in which each component distribution has independent marginals. Hettmansperger and Thomas (Citation2000) considered the situation where the elements of the vector $X_{i}$ are, not only conditionally independent, but also identically distributed. Under such an assumption, the mixture density (Equation1(1) $h (x) = \sum_{j = 1}^{m} π_{j} \prod_{r = 1}^{d} f_{j r} (x_{r}),$ (1) ) can be rewritten as (2) $h (x) = \sum_{j = 1}^{m} π_{j} \prod_{r = 1}^{d} f_{j} (x_{r}) .$ (2) They proposed an almost nonparametric approach to estimate the mixing proportions. Their key idea is to categorise data into 0 or 1 by setting an optimal cut point and then apply the EM algorithm to estimate the mixing proportion in the resulting binomial mixture models. Cruz-Medina, Hettmansperger, and Thomas (Citation2004) extended the work of Hettmansperger and Thomas (Citation2000) by transforming the observed vector into a count vector which leads to a multinomial mixture model.

To avoid possible loss of efficiency in categorising continuous data into count data, we propose a nonparametric approach to estimate the mixing proportions using empirical likelihood (EL). The EL, which was first introduced by Owen (Citation1988), is a nonparametric method of inference based on a data-driven likelihood ratio function. This nonparametric and likelihood-based approach has become one of the most effective statistical methods. See Owen (Citation2001) for a comprehensive review. As shown in Qin and Lawless (Citation1994), the EL is a prominent efficient tool in estimating parameters by incorporating estimating equations into constrained maximisation of the empirical likelihood function.

We first develop the proposed methodology for the 3-dimensional mixture models, and later on extend it to higher dimensions. For the multivariate mixture model, we propose linking the various moment estimating equations through the EL to provide a more efficient estimation. In the d-dimensional mixture model, there are $2^{d} - 1$ moment estimating equations. When d is large, it is impracticable to search for the optimal solution. We propose a simple and intuitive bootstrap-like modification of the method. First we obtain K sets of three indices chosen randomly and without replacement from $1, 2, \dots, d$ , and then multiply the K nonparametric likelihoods pertinent to the chosen indices to obtain the profile empirical likelihood ratio function.

Our simulation results show that, when the parametric model is correctly specified, our EL estimators perform similarly to the parametric estimators. However, when the parametric model is misspecified, the EL estimators perform uniformly better than the parametric estimators and the almost nonparametric estimators.

The paper is organised as follows. The proposed empirical likelihood approach for multivariate mixture model and its theoretical properties are presented in Section 2. The extension to d-dimensional (d>3) mixtures is also presented. Simulation studies and real data analysis are provided in Section 3. Discussions are given in Section 4.

2. Methodology

We first discuss the methodology for the three-variate mixture model, and then extend to multivariate mixtures with higher dimensions.

2.1. Three-variate mixture model

Let $X = (X_{1}, X_{2}, X_{3})^{T}$ be a 3-dimensional random vector with distribution function $H (x)$ and joint probability density function (3) $h (x) = π \prod_{i = 1}^{3} f_{1} (x_{i}) + (1 - π) \prod_{i = 1}^{3} f_{2} (x_{i}),$ (3) where $0 \leq π \leq 1$ , and the component density functions $f_{1}$ and $f_{2}$ are different but unspecified. This model is a special case of model (Equation2(2) $h (x) = \sum_{j = 1}^{m} π_{j} \prod_{r = 1}^{d} f_{j} (x_{r}) .$ (2) ) with m=2 and d=3.

The parameters of interest are the expectations of the random variables and the mixing proportion π. Suppose $μ_{0}$ and $μ_{1}$ are the expected values of the two components: $μ_{0} = \int x f_{1} (x) d x, μ_{1} = \int x f_{2} (x) d x,$ and that they satisfy $μ_{0} < μ_{1}$ . We then have the following moment estimating equations $\begin{aligned} E (X_{1} X_{2} X_{3}) & = π μ_{0}^{3} + (1 - π) μ_{1}^{3}, \\ E (X_{1} X_{2}) & = E (X_{1} X_{3}) = E (X_{2} X_{3}) \\ = π μ_{0}^{2} + (1 - π) μ_{1}^{2}, \\ E (X_{1}) & = E (X_{2}) = E (X_{3}) = π μ_{0} + (1 - π) μ_{1} . \end{aligned}$ There are seven estimating equations in total with three unknown parameters $(π, μ_{0}, μ_{1})$ .

Let $x_{i} = (x_{i 1}, x_{i 2}, x_{i 3})^{T}$ , $i = 1, \dots, n$ , be i.i.d. observations from the multivariate mixture model (Equation3(3) $h (x) = π \prod_{i = 1}^{3} f_{1} (x_{i}) + (1 - π) \prod_{i = 1}^{3} f_{2} (x_{i}),$ (3) ), and $p_{i} = d H (x_{i})$ . According to Owen (Citation1988), the EL function based on the observed data is (4) $\prod_{i = 1}^{n} d H (x_{i}) = \prod_{i = 1}^{n} p_{i} .$ (4) Let $θ = (π, μ_{0}, μ_{1})^{T}$ . For the distribution $H (x)$ under study, feasible $p_{i}$ 's satisfy (5) $\sum_{i = 1}^{n} p_{i} = 1, p_{i} \geq 0, and \sum_{i = 1}^{n} p_{i} g (x_{i}, θ) = 0,$ (5) where (6) $g (x_{i}, θ) = (g_{1} (x_{i}, θ), g_{2}^{T} (x_{i}, θ), g_{3}^{T} (x_{i}, θ))^{T}$ (6) with $g_{1} (x_{i}, θ) = x_{i 1} x_{i 2} x_{i 3} - π μ_{0}^{3} - (1 - π) μ_{1}^{3}$ , $\begin{aligned} g_{2} (x_{i}, θ) & = (\begin{matrix} x_{i 1} x_{i 2} - π μ_{0}^{2} - (1 - π) μ_{1}^{2} \\ x_{i 1} x_{i 3} - π μ_{0}^{2} - (1 - π) μ_{1}^{2} \\ x_{i 2} x_{i 3} - π μ_{0}^{2} - (1 - π) μ_{1}^{2} \end{matrix}) and \\ g_{3} (x_{i}, θ) & = (\begin{matrix} x_{i 1} - π μ_{0} - (1 - π) μ_{1} \\ x_{i 2} - π μ_{0} - (1 - π) μ_{1} \\ x_{i 3} - π μ_{0} - (1 - π) μ_{1} \end{matrix}) . \end{aligned}$ Inference on $θ$ is usually made through their profile likelihood, which is obtained by maximising (Equation4(4) $\prod_{i = 1}^{n} d H (x_{i}) = \prod_{i = 1}^{n} p_{i} .$ (4) ) with respect to $p_{i}$ 's subject to the constraints in (Equation5(5) $\sum_{i = 1}^{n} p_{i} = 1, p_{i} \geq 0, and \sum_{i = 1}^{n} p_{i} g (x_{i}, θ) = 0,$ (5) ). Up to a constant not depending on $θ$ , the resulting empirical log-likelihood is $ℓ (θ) = - \sum_{i = 1}^{n} \log {1 + λ^{T} g (x_{i}, θ)},$ where $λ$ is the Lagrange multiplier determined by $\frac{1}{n} \sum_{i = 1}^{n} \frac{g (x_{i}, θ)}{1 + λ^{T} g (x_{i}, θ)} = 0 .$ We can show that in a $O (n^{- 1 / 3})$ neighbourhood of the true values of $θ$ , $λ = λ (θ)$ is determined uniquely by an implicit function of $θ$ . We denote the maximum empirical likelihood estimators as $\hat{θ} = (\hat{π}, {\hat{μ}}_{0}, {\hat{μ}}_{1})^{T}$ . Their asymptotic properties are given in the following theorem by Qin and Lawless (Citation1994). When $θ$ takes its true value $θ_{0}$ , we write $g (x, θ_{0})$ to be $g (x)$ for short.

Theorem 2.1

Under the regularity conditions specified in Qin and Lawless (Citation1994). As n goes to infinity, $\sqrt{n} (\hat{θ} - θ_{0}) \overset{d}{⟶} N (0, V_{1}),$ where

$V_{1} = {[E {\{\frac{\partial g (X)}{\partial θ}\}}^{T} {E g (X) g^{T} (X)}^{- 1} E \{\frac{\partial g (X)}{\partial θ}\}]}^{- 1} .$

With $(1 (X_{1} \leq t), 1 (X_{2} \leq t), 1 (X_{3} \leq t))$ in place of $X$ , we can estimate the underlying distribution functions $F_{1} (t)$ and $F_{2} (t)$ . The asymptotic normality of the resulting empirical likelihood estimators can be established in a similar way to Theorem 2.1.

2.2. Multivariate mixtures with higher dimensions

We now extend the methodology discussed in the previous section to the case with d>3. Suppose the d-variate data $w_{i} = (w_{i 1}, \dots, w_{i d})^{T}, i = 1, \dots, n$ , arise from the mixture model with the following mixture density $h (w_{i}) = π \prod_{j = 1}^{d} f_{1} (w_{i j}) + (1 - π) \prod_{j = 1}^{d} f_{2} (w_{i j}) .$ In principle, we can adopt the same approach as in the case d=3 in order to make inferences about $θ$ . When d is large, however, the number of estimating equations we must deal with is $(\begin{matrix} d \\ d \end{matrix}) + (\begin{matrix} d \\ d - 1 \end{matrix}) + \dots + (\begin{matrix} d \\ 1 \end{matrix}) = 2^{d} - 1,$ which can be extremely large. Consequently, it is impractical to find the optimal solution to embrace that many estimating equations in the empirical likelihood setup.

We now propose a simple and intuitive solution to the high-dimensional problem. Let $M_{d} = (\binom{d}{3})$ , and $Ω_{i}$ ( $i = 1, 2, \dots, M_{d}$ ) be all the possible samples of size 3 from ${1, 2, \dots, d}$ drawn by simple random sampling without replacement. We randomly select K sets from ${Ω_{1}, \dots, Ω_{M_{d}}}$ by simple random sampling without replacement. Let $Ω_{k}^{*} = {s_{k 1}, s_{k 2}, s_{k 3}}$ ( $k = 1, 2, \dots, K$ ) be the resulting K index sets, and $u_{k i} = (x_{k i}, y_{k i}, z_{k i})^{T}$ denote $(w_{i, s_{k 1}}, w_{i, s_{k 2}}, w_{i, s_{k 3}})^{T}$ . We assume $s_{k 1} < s_{k 2} < s_{k 3}$ for each k, and treat the data with different $Ω_{k}^{*}$ as independent samples. The profile empirical likelihood ratio function of $θ$ based on the selected index sets is $\begin{aligned} R (θ) & = max \{\prod_{k = 1}^{K} \prod_{i = 1}^{n} (n p_{k i}) |\sum_{i = 1}^{n} p_{k i} = 1, p_{k i} \geq 0, \\ \sum_{i = 1}^{n} p_{k i} g (u_{k i}, θ) = 0, k = 1, \dots, K\}, \end{aligned}$ where the function $g$ is defined in (Equation6(6) $g (x_{i}, θ) = (g_{1} (x_{i}, θ), g_{2}^{T} (x_{i}, θ), g_{3}^{T} (x_{i}, θ))^{T}$ (6) ).

Applying the method of constrained optimisation, we have $\begin{aligned} G & = \sum_{k = 1}^{K} \sum_{i = 1}^{n} \log (n p_{k i}) - n \sum_{k = 1}^{K} \sum_{i = 1}^{n} p_{k i} λ_{k}^{T} g (u_{k i}, θ) \\ + \sum_{k = 1}^{K} γ_{k} (\sum_{i = 1}^{n} p_{k i} - 1), \end{aligned}$ where $λ_{k}$ and $γ_{k}$ are the Lagrange multipliers. Setting the first derivative of G with respect to $p_{k i}$ to zero, we have $\frac{\partial G}{\partial p_{k i}} = \frac{1}{p_{k i}} - n λ_{k}^{T} g (u_{k i}, θ) + γ_{k} = 0.$ Multiplying both sides of the above equation by $p_{k i}$ and summing over i give $\sum_{i = 1}^{n} p_{k i} \frac{\partial G}{\partial p_{k i}} = n + γ_{k} = 0,$ which leads to $γ_{k} = - n$ . Therefore, the maximum of $\prod_{k = 1}^{K} \prod_{i = 1}^{n} (n p_{k i})$ is attained at ${\hat{p}}_{k i} = \frac{1}{n} \frac{1}{1 + λ_{k}^{T} g (u_{k i}, θ)}, k = 1, \dots, K,$ where the Lagrange multipliers $λ_{k} = λ_{k} (θ)$ 's are the solutions to $\frac{1}{n} \sum_{i = 1}^{n} \frac{g (u_{k i}, θ)}{1 + λ_{k}^{T} g (u_{k i}, θ)} = 0.$ Putting ${\hat{p}}_{k i}$ back and taking logarithm, we have the profile empirical log-likelihood ratio function of $θ$ , $ℓ (θ) = \log {R (θ)} = - \sum_{k = 1}^{K} \sum_{i = 1}^{n} \log {1 + λ_{k}^{T} g (u_{k i}, θ)} .$ We show that with probability tending to one, there must be a local maximum point in a very small neighbourhood of the true parameter value of $θ$ . Let $Ω^{*} = {Ω_{1}^{*}, \dots, Ω_{K}^{*}}$ .

Lemma 2.1

Let $θ_{0} = (π_{*}, μ_{0 *}, μ_{1 *})$ be the true value of $θ$ . Suppose $\int | x |^{9} d F_{0} (x) + \int | x |^{9} d F_{1} (x) < \infty,$ $π_{*} \in (0, 1)$ and $μ_{0 *} \neq μ_{1 *},$ and that $F_{0}$ and $F_{1}$ are non-degenerate distributions. Conditioning on $Ω^{*},$ as $n \to \infty,$ $ℓ (θ)$ attains its maximum value at some point $\hat{θ}$ with probability 1 in the interior of the ball $∥ θ - θ_{0} ∥ \leq n^{- 1 / 3}$ . Let $\hat{λ} = ({\hat{λ}}_{1}^{T}, \dots, {\hat{λ}}_{K}^{T})^{T}$ with ${\hat{λ}}_{i} = λ (\hat{θ})$ . Consequently, $\hat{θ}$ and $\hat{λ}$ satisfy $\begin{aligned} Q_{k n} (\hat{θ}, {\hat{λ}}_{k}) = 0 for k = 1, \dots, K, and \\ Q_{0 n} (\hat{θ}, \hat{λ}) = 0, \end{aligned}$ where $\begin{aligned} Q_{k n} (θ, λ) & = \frac{1}{n} \sum_{i = 1}^{n} \frac{g (u_{k i}, θ)}{1 + λ_{k}^{T} g (u_{k i}, θ)}, \\ Q_{0 n} (θ, λ) & = \frac{1}{n} \sum_{k = 1}^{K} \sum_{i = 1}^{n} \frac{1}{1 + λ_{k}^{T} g (u_{k i}, θ)} \\ {(\frac{\partial g (u_{k i}, θ)}{\partial θ})}^{T} λ_{k} . \end{aligned}$

Lemma 2.1 implies that the proposed EL estimator $\hat{θ}$ is consistent. Based on Lemma 2.1, we further establish the asymptotic normality of $\hat{θ}$ in the following theorem. This result is an extension of Theorem 1 in Qin and Lawless (Citation1994). It embraces the correlation structure of the selected elements within the random vectors.

Theorem 2.2

Assume the conditions of Lemma 2.1. Let $S_{11} = E {g (X) g^{T} (X)},$ $S_{12} = S_{21}^{T} = - E {\partial g (X) / \partial θ^{T}},$ and $Σ_{o f f} = \frac{1}{K (K - 1)} \sum_{1 \leq k \neq j \leq K} E {g (u_{k 1}) g^{T} (u_{j 1}) | Ω^{*}} .$ Conditioning on $Ω^{*},$ as n goes to infinity, $\sqrt{n} (\hat{θ} - θ_{0})$ converges in distribution to $N (0, V_{2}),$ where $\begin{aligned} V_{2} & = \frac{1}{K} (S_{21} S_{11}^{- 1} S_{12})^{- 1} \\ + \frac{K - 1}{K} (S_{21} S_{11}^{- 1} S_{12})^{- 1} (S_{21} S_{11}^{- 1}) \\ Σ_{o f f} (S_{11}^{- 1} S_{12}) (S_{21} S_{11}^{- 1} S_{12})^{- 1} . \end{aligned}$

If there are no common elements in $Ω_{k}^{*}$ and $Ω_{j}^{*}$ , then $E {g (u_{k 1}) g^{T} (u_{j 1}) | Ω^{*}} = 0$ . Further, if d is quite large, and there are no common elements in any pair of $Ω_{k}^{*}$ and $Ω_{j}^{*}$ ( $k \neq j$ ), then $Σ_{o f f} = 0$ , and $V_{2} = (S_{21} S_{11}^{- 1} S_{12})^{- 1} / K$ . At the other extreme, if $Ω_{k}^{*} = Ω_{1}^{*}$ for $k = 2, \dots, K$ , then $Σ_{o f f} = S_{11}$ , and $V_{2} = (S_{21} S_{11}^{- 1} S_{12})^{- 1}$ . Therefore, the second term in $V_{2}$ stands for the efficiency loss due to the fact that some data are used more than once.

3. Simulation studies and data analysis

3.1. Simulation studies

We have carried out simulations to evaluate the finite-sample performance of the proposed empirical likelihood estimators (EL). For comparison, we have also considered two of its competitors: the maximum likelihood estimators (ML) under the multivariate normal mixture model, and the almost nonparametric estimators based on multinomial mixtures (Cruz-Medina et al. (Citation2004); MN for short). Both the ML and MN estimators can be calculated by the EM algorithm.

We generate data from the mixture model (Equation3(3) $h (x) = π \prod_{i = 1}^{3} f_{1} (x_{i}) + (1 - π) \prod_{i = 1}^{3} f_{2} (x_{i}),$ (3) ). Different specifications of component distributions $f_{1}$ and $f_{2}$ are listed below:

(Normal mixtures) $f_{1}$ and $f_{2}$ are the density functions of $N (μ_{1}, 1)$ and $N (μ_{2}, 1)$ , respectively. Here $μ_{1} = 0$ and $μ_{2} = 1$ or 2.
(Non-central t mixtures) $f_{1}$ and $f_{2}$ are the density functions of $t (r, a (r) μ_{1})$ and $t (r, a (r) μ_{2})$ , respectively. Here $t (r, a (r) μ)$ denotes a t-distribution with r degrees of freedom, non-centrality parameter $a (r) μ$ , and mean μ, where $a (r) = (2 / r) (Γ (r / 2) / Γ ((r - 1) / 2))$ . Here r=4, $μ_{1} = 0$ , and $μ_{2} = 1.5$ or 2.
(Chi-square mixtures) $f_{1}$ and $f_{2}$ are the density functions of $χ_{μ_{1}}^{2}$ and $χ_{μ_{2}}^{2}$ . Here $μ_{1} = 5$ and $μ_{2} = 10$ or 20.

For each setting, we generate 1000 samples with sample size n=400, d=3 or 6, and

π = 0.2

, 0.5, or 0.8. When d=6, we set K=8 in the proposed EL method. We calculate the biases and standard deviations of the estimators under comparison, and summarise the results in Tables –.

Table 1. Biases (%) and standard deviations (%) (in parentheses) of different estimators based on 1,000 simulations with n=400. Data were generated from the multivariate mixture model with $f_{1}$ and $f_{2}$ being $N (μ_{1}, 1)$ and $N (μ_{2}, 1)$ , respectively. Here $μ_{1} = 0$ , $μ_{2} = 1$ or 2 and d=3 or 6.

Display Table

Table 2. Biases (%) and standard deviations (%) (in parentheses) of different estimators based on 1,000 simulations with n=400. Data were generated from the multivariate mixture model with $f_{1}$ and $f_{2}$ being $t (4, μ_{1})$ and $t (4, μ_{2} / {\sqrt{2} Γ (3 / 2) / Γ (2)})$ , respectively. Here $μ_{1} = 0$ , $μ_{2} = 1.5$ or 2 and d=3 or 6.

Display Table

Table 3. Biases (%) and standard deviations (%) (in parentheses) of different estimators based on 1,000 simulations with n=400. Data were generated from the multivariate mixture model with $f_{1}$ and $f_{2}$ being $χ_{μ_{1}}^{2}$ and $χ_{μ_{2}}^{2}$ , respectively. Here $μ_{1} = 5$ , $μ_{2} = 10$ or 20 and d=3 or 6.

Display Table

Let us first examine Table , where the multivariate normal mixture model is correctly specified. As expected, the ML estimators have the smallest standard deviations in all cases and the smallest absolute biases in most cases. The proposed EL estimators perform very similarly to the ML estimators and both of them are uniformly better than the MN estimators. As $μ_{2}$ goes further away from $μ_{1} = 0$ , all estimators have decreasing standard deviations. This may be because the two component distributions in the mixture model also get further away from each other. When π increases from 0.2 to 0.8, the performances of all the three estimators for $μ_{1}$ are getting better, while those for $μ_{2}$ are getting worse. This is probably because as π increases, the multivariate normal mixture contains increasing information about $μ_{1}$ but decreasing information about $μ_{2}$ . All the three estimators for π have better performance when π lies in the middle than on the boundaries of its parameter space.

However, when data are generated from non-normal mixtures, the ML estimators lose their optimality. From Tables –, we can see that compared with the MN estimators, they have smaller absolute biases in some cases, but larger standard deviations in other cases. The proposed EL estimators perform reasonably well as they have uniformly smaller biases and standard deviations than the other two competitors.

If the mixing proportion is of primary interest, we see that when the multivariate normal mixture is correctly specified, the ML estimator again performs the best and the EL estimator has almost the same reasonable performance. Both of them perform better than the MN estimator. When the model is misspecified, the EL estimator has the best performance followed by the MN estimator. These two estimators usually win the ML estimator by a large amount. For example, in Table , when $π = 0.5$ , $μ_{2} = 1.5$ , and d=3, all three estimators for π have similar standard deviations, however, the ML estimator has a much larger absolute bias (0.3044) compared with the EL estimator (0.0056), and the MN estimator (0.0042).

When the data dimension d increases from 3 to 6, the standard deviations of both the EL and MN estimators are getting smaller but they have different performances in bias. The absolute biases of the EL estimators are always getting smaller, while those of the ML and MN estimators are not the case. For example, in Table , when $π = 0.2$ and $μ_{2} = 10$ , the absolute bias of the MN estimator for $μ_{2}$ increases from 0.5742 to 0.6359 and that of the ML estimator for $μ_{1}$ increases from 0.0366 to 0.2254. By contrast, that of the EL estimators for both $(μ_{1}, μ_{2})$ decreases from $(0.0958, 0.0519)$ to $(0.0659, 0.0133)$ .

Overall, the EL method exhibits more robust performance than the MN and ML methods for different model specifications. When the normal mixture is correctly specified, the proposed EL estimators have comparable performance as the ML estimators. When the normal mixture is misspecified, the EL estimators perform uniformly better than the other two competitors.

3.2. Data analysis

Reaction time (RT) task is one of the most common experimental methods in psychology to study individual differences. In this section, we apply our proposed empirical likelihood method to a RT data set which was analysed by Cruz-Medina et al. (Citation2004). In this experiment, 197 nine-year-old children were tested on mental rotation task in which a target figure was presented on the left and another one on the right. Children thus had to determine whether the second figure was identical to the first or simply a mirror image instead. The RT was recorded in milliseconds. There were 6 trials, and we considered these trials as d=6 repeated measurements. The time delays between trials were randomly chosen so that children would unable to anticipate the length of delays. The subsequent trials were then expected or assumed to be independent. We display only the histogram of the first measurement of the data in Figure ; those for the rest are similar. Cruz-Medina et al. (Citation2004) suggested using a two-component mixture to fit the heterogeneous RT distribution.

Since recorded in milliseconds, the RT values range from around 700 to 7000. For convenience, we re-scale them in seconds; the resulting numbers are no greater than 10. Although the mixing proportion π is of primary interest, we calculate the EL, MN and ML estimators for all the three parameters $π, μ_{1}$ and $μ_{2}$ . The results are tabulated in Table . Based on these point estimates, we also provide 95% Wald interval estimates for all the three parameters with variances estimated by 200 bootstrap repetitions.

Figure 1. Histogram of the first measurement of the RT data.

Table 4. Point and interval estimates of the EL, MN and ML methods for $π, μ_{1}$ and $μ_{2}$ . EL $_{0}$ : EL with $K = (\binom{6}{3}) = 20$ ; EL $_{1}$ , EL $_{2}$ , EL $_{3}$ : EL with K=8; MN $_{1}$ : MN with cut points $c_{1}, \dots, c_{10}$ being the deciles of the empirical distribution, which was suggested by Cruz-Medina et al. (Citation2004) for general use; MN $_{2}$ : MN with cut points $(c_{1}, \dots, c_{10}) = (0.5, 1, 1.2, 1.4, 1.6, 2, 2.5, 3, 4, 5)$ , which was used by Cruz-Medina et al. (Citation2004) when they analysed this dataset.

Display Table

As mentioned in Section 2.2, the EL estimator depends on the K randomly selected sets $Ω_{k}^{*}$ ( $k = 1, 2, \dots, K$ ). Therefore, we shall obtain different EL estimates in general when applying the EL method more than one time if $K < (\binom{d}{3})$ . We apply the EL method with K=8 three times, and denote the results by EL $_{1}$ , EL $_{2}$ and EL $_{3}$ , respectively. In this example, d=6. When $K = (\binom{6}{3}) = 20$ , the results are denoted by EL $_{0}$ . We see that the EL estimates with K=8 are very close to those with K=20. This confirms that the proposed random selection strategy works very well. The EL proportion estimates are all around 0.7, and the EL estimates for $μ_{1}$ and $μ_{2}$ are around 1.6 and 2.9, respectively.

When applying the MN method, we need to determine the cut points $c_{i}$ 's. For general use, Cruz-Medina et al. (Citation2004) suggested using 10 cut points and choosing $c_{1}, \dots, c_{10}$ to be the deciles of the empirical distribution of the data. The resulting MN method, denoted by MN $_{1}$ , is also the MN method compared in our simulation study. When analysing the RT data, Cruz-Medina et al. (Citation2004) used $(c_{1}, \dots, c_{10}) = (0.5, 1, 1.2, 1.4, 1.6, 2, 2.5, 3, 4, 5)$ . We denote the resulting MN method by MN $_{2}$ . It seems that the MN results depend to some extent on the choice of cutting points, because the MN $_{1}$ proportion estimate 0.52 is quite different from that of MN $_{2}$ 0.59. In the meantime, the MN $_{2}$ point and interval estimates are both nearly equal to those of the ML method.

According to our simulation studies, the EL method exhibits more robust performance than the MN and ML methods. This indicates that the EL analysis results are more trustworthy than those of the other two methods.

4. Discussions

In this paper, we proposed an empirical likelihood-based estimation method for the parameters of a multivariate two-component mixture model. We discussed three-variate mixtures in detail and extended the methodology to high-dimensional mixtures by giving a permutation-like method which reduces the high-dimensional problem to a three-dimensional situation. The performance and efficiency of the method are demonstrated through a real data example as well as simulation studies. The simulation results show that the proposed method is quite efficient in comparison to both completely parametric and almost nonparametric methods in the literature. Furthermore, the proposed method can accommodate parameter estimation in high-dimensional mixtures by requiring estimation only in three dimensions.

The extension of our approach to mixtures with more than two components is valuable and interesting. Similar to the two-component mixture situation, one can use a set of moment conditions implied by the mixture model to identify and estimate mixing proportions and other component parameters. When the number of components grows, the number of unknown parameters increases. The improvement in the performance of the proposed approach in terms of better identification and higher efficiency may crucially depend on the choice of the set of moment conditions. We will consider it in future research.

Acknowledgements

The authors would like to thank the editor, the AE, and the referee for their insightful comments and suggestions. The authors would like to thank Dr Jing Qin for valuable discussions and many helpful comments.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

The research is partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grants (RGPIN-2018-05846, RGPIN-2018-05981), the National Natural Science Foundation of China (Grant Numbers 11771144, 11501354 and 11501208), and the Chinese 111 Project (B14019).

Notes on contributors

Yuejiao Fu

Yuejiao Fu is an Associate Professor of Statistics in the Department of Mathematics and Statistics at York University, Canada. She received her PhD in Statistics in 2004 from the University of Waterloo. Her research interests include mixture models, empirical likelihood, and statistical genetics.

Yukun Liu

Yukun Liu is a Professor in the School of Statistics, Faculty of Economic and Management, East China Normal University, China. He received his PhD in Statistics in 2009 from Nankai University, China. His research interests include nonparametric and semiparametric statistics based on empirical likelihood and their applications in case-control data, capture-recapture data, selection biased data, and finite mixture models.

Hsiao-Hsuan Wang

Hsiao-Hsuan Wang received her PhD in Statistics in 2010 from York University, Canada. She is now a director in Model Quantification, Enterprise Risk Management, CIBC, Canada.

Xiaogang Wang

Xiaogang Wang is a Professor in Statistics in the Department of Mathematics and Statistics of York University. He is also holding an adjunct position as a senior research fellow at the Institute of Data Science of Tsinghua University in Beijing. He received his PhD in Statistics from the University of British Columbia in 2001. His current research is on statistical analysis of complex data in health and life sciences.

References

Cruz-Medina, I. R., Hettmansperger, T. P., & Thomas, H. (2004). Semiparametric mixture models and repeated measures: The multinomial cut point model. Journal of the Royal Statistical Society: Series C (Applied Statistics), 53, 463–474. doi: 10.1111/j.1467-9876.2004.05203.x
Web of Science ®Google Scholar
Hall, P., & Zhou, X. H. (2003). Nonparametric estimation of component distributions in a multivariate mixture. The Annals of Statistics, 31, 201–224. doi: 10.1214/aos/1046294462
Web of Science ®Google Scholar
Hettmansperger, T. P., & Thomas, H. (2000). Almost nonparametric inference for repeated measures in mixture models. Journal of the Royal Statistical Society. Series B, 62, 811–825. doi: 10.1111/1467-9868.00266
Google Scholar
Kasahara, H., & Shimotsu, K. (2014). Nonparametric identification and estimation of the number of components in multivariate mixtures. Journal of the Royal Statistical Society. Series B, 76(1), 97–111. doi: 10.1111/rssb.12022
Google Scholar
Lindsay, B. G. (1995). Mixture models: Theory, geometry and applications. Hayward: Institute for Mathematical Statistics.
Google Scholar
McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York: Wiley.
Google Scholar
Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75, 237–249. doi: 10.1093/biomet/75.2.237
Web of Science ®Google Scholar
Owen, A. B. (2001). Empirical likelihood. New York: Chapman & Hall/CRC.
Google Scholar
Qin, J., & Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics, 22, 300–325. doi: 10.1214/aos/1176325370
Web of Science ®Google Scholar
Thomas, H., & Horton, J. J. (1997). Competency criteria and the class inclusion task: Modeling judgments and justifications. Developmental Psychology, 33, 1060–1073. doi: 10.1037/0012-1649.33.6.1060
Web of Science ®Google Scholar
Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. New York: Wiley.
Google Scholar

Appendix

Since both Lemma 2.1 and Theorem 2.2 are established conditionally on the K selected sets

Ω_{k}^{*}

(

k = 1, 2, \dots, K

), for convenience we regard the K selected sets as fixed sets throughout the proofs. Note that

u_{k i}

's are i.i.d. random vectors for fixed k and varying i, while they are not independent for fixed i and varying k.

Proof

Proof of Lemma 2.1

We consider $θ \in {θ | ∥ θ - θ_{0} ∥ = n^{- 1 / 3}}$ , which can be rewritten as $θ = θ_{0} + n^{- 1 / 3} v$ with $∥ v ∥ = 1$ . From Qin and Lawless (Citation1994), we can show that $∥ λ_{k} ∥ = O (n^{- 1 / 3})$ and $\begin{aligned} λ_{k} (θ) & = {\{\frac{1}{n} \sum_{i = 1}^{n} g (u_{k i}, θ) g^{T} (u_{k i}, θ)\}}^{- 1} \{\frac{1}{n} \sum_{i = 1}^{n} g (u_{k i}, θ)\} \\ + o (n^{- 1 / 3}) (a.s.) \end{aligned}$ uniformly about $θ \in {θ | ∥ θ - θ_{0} ∥ \leq n^{- 1 / 3}}$ , for each $k = 1, \dots, K$ . By Taylor's expansion, we have

$\begin{aligned} - ℓ (θ) & = \sum_{k = 1}^{K} \sum_{i = 1}^{n} \log {1 + λ^{T} g (u_{k i}, θ)} \\ = \frac{n}{2} \sum_{k = 1}^{K} {[\frac{1}{n} \sum_{i = 1}^{n} g (u_{k i}, θ)]}^{T} {[\frac{1}{n} \sum_{i = 1}^{n} g (u_{k i}, θ) g^{T} (u_{k i}, θ)]}^{- 1} \\ \times [\frac{1}{n} \sum_{i = 1}^{n} g (u_{k i}, θ)] + o (n^{1 / 3}) (a.s.) \\ = \frac{n}{2} \sum_{k = 1}^{K} {[\frac{1}{n} \sum_{i = 1}^{n} g (u_{k i}, θ_{0}) + \frac{1}{n} \sum_{i = 1}^{n} \frac{\partial g (u_{k i}, θ_{0})}{\partial θ} v n^{- 1 / 3}]}^{T} \\ \times {[\frac{1}{n} \sum_{i = 1}^{n} g (u_{k i}, θ) g^{T} (u_{k i}, θ)]}^{- 1} \\ \times [\frac{1}{n} \sum_{i = 1}^{n} g (u_{k i}, θ_{0}) + \frac{1}{n} \sum_{i = 1}^{n} \frac{\partial g (u_{k i}, θ_{0})}{\partial θ} v n^{- 1 / 3}] \\ + o (n^{1 / 3}) (a.s.) \\ = \frac{n K}{2} {[O (n^{- 1 / 2} (\log \log n)^{1 / 2}) + E (\frac{\partial g (u, θ_{0})}{\partial θ}) v n^{- 1 / 3}]}^{T} \\ \times {[E (g (u, θ_{0}) g^{T} (u, θ_{0}))]}^{- 1} \\ \times [O (n^{- 1 / 2} (\log \log n)^{1 / 2}) + E (\frac{\partial g (u, θ_{0})}{\partial θ}) v n^{- 1 / 3}] \\ + o (n^{1 / 3}) (a.s.) \\ \geq (c / 2) n^{1 / 3}, (a.s.), \end{aligned}$ where c is the smallest eigenvalue of $E {(\frac{\partial g (u, θ_{0})}{\partial θ})}^{T} {[E (g (u, θ_{0}) g^{T} (u, θ_{0}))]}^{- 1} E (\frac{\partial g (u, θ_{0})}{\partial θ}) .$ Similarly, $\begin{aligned} - ℓ (θ_{0}) & = \frac{n}{2} \sum_{k = 1}^{K} {[\frac{1}{n} \sum_{i = 1}^{n} g (u_{k i}, θ_{0})]}^{T} \\ \times {[\frac{1}{n} \sum_{i = 1}^{n} g (u_{k i}, θ_{0}) g^{T} (u_{k i}, θ_{0})]}^{- 1} \\ \times [\frac{1}{n} \sum_{i = 1}^{n} g (u_{k i}, θ_{0})] + o (1) (a.s.) \\ = O (\log \log n) . (a.s.) \end{aligned}$ Since $ℓ (θ)$ is a continuous function of $θ$ when $θ$ belongs to the ball $∥ θ - θ_{0} ∥ \leq n^{- 1 / 3}$ , as n is large, $ℓ (θ)$ must have a maximum point $\hat{θ}$ in the interior of this ball such that

$\begin{aligned} {\frac{\partial ℓ (θ)}{\partial θ}|}_{θ = \hat{θ}} \\ = - \sum_{k = 1}^{K} \sum_{i = 1}^{n} {\frac{(\partial λ_{k}^{T} (θ) / \partial θ) g (u_{k i}, θ) + (\partial g (u_{k i}, θ) / \partial θ)^{T} λ_{k} (θ)}{1 + λ_{k}^{T} (θ) g (u_{k i}, θ)}|}_{θ = \hat{θ}} \\ = - \sum_{k = 1}^{K} \sum_{i = 1}^{n} {\frac{1}{1 + λ_{k}^{T} (θ) g (u_{k i}, θ)} {(\frac{\partial g (u_{k i}, θ)}{\partial θ})}^{T} λ_{k} (θ)|}_{θ = \hat{θ}} \\ = 0 . \end{aligned}$

Proof

Proof of Theorem 2.2

Taking derivatives about $θ$ and $λ^{T}$ , we have $\begin{aligned} \frac{\partial Q_{k n} (θ, 0)}{\partial θ} & = \frac{1}{n} \sum_{i = 1}^{n} \frac{\partial g (u_{k i}, θ)}{\partial θ}, \frac{\partial Q_{k n} (θ, 0)}{\partial λ_{j}^{T}} \\ = - \frac{1}{n} \sum_{i = 1}^{n} g (u_{k i}, θ) g^{T} (u_{j i}, θ) δ_{k j}, \\ \frac{\partial Q_{0 n} (θ, 0)}{\partial θ} & = 0, \frac{\partial Q_{0 n} (θ, 0)}{\partial λ_{k}^{T}} = \frac{1}{n} \sum_{i = 1}^{n} {(\frac{\partial g (u_{k i}, θ)}{\partial θ})}^{T}, \end{aligned}$

for $k, j = 1, \dots, K$ , and $δ_{k j}$ is the Kronecker delta. Expanding $Q_{k n} (\hat{θ}, \hat{λ})$ and $Q_{0 n} (\hat{θ}, \hat{λ})$ at $(θ_{0}, 0)$ , we have $\begin{aligned} 0 & = Q_{k n} (\hat{θ}, {\hat{λ}}_{k}) = Q_{k n} (θ_{0}, 0) + \frac{\partial Q_{k n} (θ_{0}, 0)}{\partial λ_{k}^{T}} ({\hat{λ}}_{k} - 0) \\ + \frac{\partial Q_{k n} (θ_{0}, 0)}{\partial θ} (\hat{θ} - θ_{0}) + o_{p} (δ_{n}), \\ 0 & = Q_{0 n} (\hat{θ}, \hat{λ}) = Q_{0 n} (θ_{0}, 0) + \sum_{k = 1}^{K} \frac{\partial Q_{0 n} (θ_{0}, 0)}{\partial λ_{k}^{T}} ({\hat{λ}}_{k} - 0) \\ + \frac{\partial Q_{0 n} (θ_{0}, 0)}{\partial θ} (\hat{θ} - θ_{0}) + o_{p} (δ_{n}), \end{aligned}$

where $δ_{n} = ∥ \hat{θ} - θ_{0} ∥ + \sum_{k = 1}^{K} ∥ {\hat{λ}}_{k} ∥$ .

It follows from the above equations that (A1) $(\begin{matrix} \hat{λ} \\ \hat{θ} - θ_{0} \end{matrix}) = S_{n}^{- 1} (\begin{matrix} D_{n} \\ 0 \end{matrix}) + o_{p} (δ_{n}) .$ (A1) Here $D_{n} = (\begin{matrix} Q_{1 n} (θ_{0}, 0) \\ ⋮ \\ Q_{K n} (θ_{0}, 0) \end{matrix}), S_{n} = (\begin{array}{cc} S_{11 n} & S_{12 n} \\ S_{21 n} & S_{22 n} \end{array}),$ where $\begin{aligned} S_{11 n} & = {(- \frac{\partial Q_{k n} (θ_{0}, 0)}{\partial λ_{j}^{T}})}_{1 \leq j, k \leq K} \\ = diag (\frac{1}{n} \sum_{i = 1}^{n} g (u_{1 i}, θ_{0}) g^{T} (u_{1 i}, θ_{0}), \dots, \\ \frac{1}{n} \sum_{i = 1}^{n} g (u_{K i}, θ_{0}) g^{T} (u_{K i}, θ_{0})), \end{aligned}$ $\begin{aligned} S_{12 n} & = - {(\frac{\partial Q_{1 n} (θ_{0}, 0)}{\partial θ}, \dots, \frac{\partial Q_{K n} (θ_{0}, 0)}{\partial θ})}^{T} \\ = - (\frac{1}{n} \sum_{i = 1}^{n} \frac{\partial g (u_{1 i}, θ_{0})}{\partial θ}, \dots, \frac{1}{n} \sum_{i = 1}^{n} \frac{\partial g (u_{K i}, θ_{0})}{\partial θ}), \end{aligned}$ $S_{21 n} = S_{12 n}^{T}$ and $S_{22 n} = - \partial Q_{0 n} (θ_{0}, 0) / \partial θ = 0$ .

Define $S_{11} = I_{K} \otimes S_{11}$ and $S_{12} = 1_{K} \otimes S_{12}$ , where ⊗ is the Kronecker product operator. Under the conditions of Theorem 2.2, as $n \to \infty$ , it can be verified that $S_{11 n} = S_{11} + o_{p} (1), S_{12 n} = S_{12} + o_{p} (1),$ and therefore $S_{n} = S + o_{p} (1),$ where $S = (\begin{matrix} S_{11} & S_{12} \\ S_{21} & S_{22} \end{matrix}) = (\begin{matrix} I_{K} \otimes S_{11} & 1_{K} \otimes S_{12} \\ 1_{K}^{T} \otimes S_{12}^{T} & 0 \end{matrix}) .$ In addition, $\sqrt{n} D_{n}$ converges in distribution to $N (0, Σ)$ , where $Σ = {(E {g (u_{k 1}) g^{T} (u_{j 1}) | Ω^{*}})}_{1 \leq k, j \leq K} .$ Therefore, $δ_{n} = O_{p} (n^{- 1 / 2})$ . Since the inverse of $S$ is $S^{- 1} = (\begin{matrix} S_{11}^{- 1} + S_{11}^{- 1} S_{12} S_{22.1}^{- 1} S_{21} S_{11}^{- 1} & - S_{11}^{- 1} S_{12} S_{22.1}^{- 1} \\ - S_{22.1}^{- 1} S_{21} S_{11}^{- 1} & S_{22.1}^{- 1} \end{matrix}),$ where $S_{22.1} = - S_{21} S_{11}^{- 1} S_{12}$ , we further have $\sqrt{n} (\hat{θ} - θ_{0}) = - S_{22.1}^{- 1} S_{21} S_{11}^{- 1} \cdot \sqrt{n} D_{n},$ which converges in distribution to $N (0, V_{2})$ with (A2) $V_{2} = S_{22.1}^{- 1} S_{21} S_{11}^{- 1} Σ S_{11}^{- 1} S_{12} S_{22.1}^{- 1} .$ (A2) With some algebra, it can be seen that $S_{22.1} = - K S_{21} S_{11}^{- 1} S_{12}$ and $S_{21} S_{11}^{- 1} = 1_{K}^{T} \otimes (S_{21} S_{11}^{- 1})$ , which implies $\begin{aligned} V_{2} & = K^{- 2} (S_{21} S_{11}^{- 1} S_{12})^{- 1} 1_{K}^{T} \otimes (S_{21} S_{11}^{- 1}) \\ Σ 1_{K} \otimes (S_{11}^{- 1} S_{12}) (S_{21} S_{11}^{- 1} S_{12})^{- 1} \\ = K^{- 2} (S_{21} S_{11}^{- 1} S_{12})^{- 1} (S_{21} S_{11}^{- 1}) \\ \sum_{k, j = 1}^{K} E {g (u_{k 1}) g^{T} (u_{j 1}) | Ω^{*}} (S_{11}^{- 1} S_{12}) (S_{21} S_{11}^{- 1} S_{12})^{- 1} \\ = K^{- 1} (S_{21} S_{11}^{- 1} S_{12})^{- 1} + \frac{K - 1}{K} (S_{21} S_{11}^{- 1} S_{12})^{- 1} (S_{21} S_{11}^{- 1}) \\ Σ_{o f f} (S_{11}^{- 1} S_{12}) (S_{21} S_{11}^{- 1} S_{12})^{- 1} . \end{aligned}$ This finishes the proof of Theorem 2.2.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Empirical likelihood estimation in multivariate mixture models with repeated measurements

Abstract

1. Introduction

2. Methodology

2.1. Three-variate mixture model

2.2. Multivariate mixtures with higher dimensions

3. Simulation studies and data analysis

3.1. Simulation studies

3.2. Data analysis

4. Discussions

Acknowledgements

Disclosure statement

Notes on contributors

Yuejiao Fu

Yukun Liu

Hsiao-Hsuan Wang

Xiaogang Wang

References

Appendix

Proof of Lemma 2.1

Proof of Theorem 2.2

Information for

Open access

Opportunities

Help and information

Empirical likelihood estimation in multivariate mixture models with repeated measurements

Abstract

1. Introduction

2. Methodology

2.1. Three-variate mixture model

2.2. Multivariate mixtures with higher dimensions

3. Simulation studies and data analysis

3.1. Simulation studies

Table 1. Biases (%) and standard deviations (%) (in parentheses) of different estimators based on 1,000 simulations with n=400. Data were generated from the multivariate mixture model with f1 and f2 being N(μ1,1) and N(μ2,1), respectively. Here μ1=0, μ2=1 or 2 and d=3 or 6.

Table 2. Biases (%) and standard deviations (%) (in parentheses) of different estimators based on 1,000 simulations with n=400. Data were generated from the multivariate mixture model with f1 and f2 being t(4,μ1) and t(4,μ2/{2Γ(3/2)/Γ(2)}), respectively. Here μ1=0, μ2=1.5 or 2 and d=3 or 6.

Table 3. Biases (%) and standard deviations (%) (in parentheses) of different estimators based on 1,000 simulations with n=400. Data were generated from the multivariate mixture model with f1 and f2 being χμ12 and χμ22, respectively. Here μ1=5, μ2=10 or 20 and d=3 or 6.

3.2. Data analysis

4. Discussions

Acknowledgements

Disclosure statement

Additional information

Funding

Notes on contributors

Yuejiao Fu

Yukun Liu

Hsiao-Hsuan Wang

Xiaogang Wang

References

Appendix

Proof of Lemma 2.1

Proof of Theorem 2.2

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date