419
Views
2
CrossRef citations to date
0
Altmetric
Short Communications

Empirical likelihood estimation in multivariate mixture models with repeated measurements

ORCID Icon, , &
Pages 152-160 | Received 12 Nov 2018, Accepted 07 Jun 2019, Published online: 19 Jun 2019

Abstract

Multivariate mixtures are encountered in situations where the data are repeated or clustered measurements in the presence of heterogeneity among the observations with unknown proportions. In such situations, the main interest may be not only in estimating the component parameters, but also in obtaining reliable estimates of the mixing proportions. In this paper, we propose an empirical likelihood approach combined with a novel dimension reduction procedure for estimating parameters of a two-component multivariate mixture model. The performance of the new method is compared to fully parametric as well as almost nonparametric methods used in the literature.

1. Introduction

Mixture models provide a flexible way of modelling complex data obtained from a population with observed or unobserved heterogeneity. Mixture models have been applied in astronomy, biology, fishery, human genetics, and other scientific areas of research. See Titterington, Smith, and Makov (Citation1985), Lindsay (Citation1995), McLachlan and Peel (Citation2000), and references therein.

We consider a special multivariate mixture model where repeated measurements are available for each subject. Let X1,,Xn be independent and identically distributed (i.i.d.) d-variate random vectors from a finite mixture model with m components. If the elements of the vector Xi are independent conditional on belonging to a subpopulation, then the mixture density is given by (1) h(x)=j=1mπjr=1dfjr(xr),(1) where πj's are mixing proportions such that j=1mπj=1, πj>0 for all j, and f(), with or without subscripts, denotes a univariate density function.

The above data structure is quite common especially in social sciences where measurements are taken repeatedly for various reasons. For example, the goal of research on preschool children's inclusion task responses is to study different solution strategies with which young children solve a given cognitive task. The solution strategy is often called the latent variable since it is hidden and unobservable. A group of preschool children can be considered as a sample from a mixture model where the components correspond to the various solution strategies; see Thomas and Horton (Citation1997). In a simplified setting, one could assume that there are two main solution strategies which lead to a mixture model with two components.

Many researchers studied the nonparametric identifiability of the above multivariate mixture model. Hall and Zhou (Citation2003) showed that the model (Equation1) is always nonparametrically unidentifiable when d=2 and m=2. Under some mild regularity conditions, Hall and Zhou (Citation2003) proved that the two-component mixture model is nonparametrically identifiable for d3. Kasahara and Shimotsu (Citation2014) discussed the identifiability of the number of components in multivariate mixture models in which each component distribution has independent marginals. Hettmansperger and Thomas (Citation2000) considered the situation where the elements of the vector Xi are, not only conditionally independent, but also identically distributed. Under such an assumption, the mixture density (Equation1) can be rewritten as (2) h(x)=j=1mπjr=1dfj(xr).(2) They proposed an almost nonparametric approach to estimate the mixing proportions. Their key idea is to categorise data into 0 or 1 by setting an optimal cut point and then apply the EM algorithm to estimate the mixing proportion in the resulting binomial mixture models. Cruz-Medina, Hettmansperger, and Thomas (Citation2004) extended the work of Hettmansperger and Thomas (Citation2000) by transforming the observed vector into a count vector which leads to a multinomial mixture model.

To avoid possible loss of efficiency in categorising continuous data into count data, we propose a nonparametric approach to estimate the mixing proportions using empirical likelihood (EL). The EL, which was first introduced by Owen (Citation1988), is a nonparametric method of inference based on a data-driven likelihood ratio function. This nonparametric and likelihood-based approach has become one of the most effective statistical methods. See Owen (Citation2001) for a comprehensive review. As shown in Qin and Lawless (Citation1994), the EL is a prominent efficient tool in estimating parameters by incorporating estimating equations into constrained maximisation of the empirical likelihood function.

We first develop the proposed methodology for the 3-dimensional mixture models, and later on extend it to higher dimensions. For the multivariate mixture model, we propose linking the various moment estimating equations through the EL to provide a more efficient estimation. In the d-dimensional mixture model, there are 2d1 moment estimating equations. When d is large, it is impracticable to search for the optimal solution. We propose a simple and intuitive bootstrap-like modification of the method. First we obtain K sets of three indices chosen randomly and without replacement from 1,2,,d, and then multiply the K nonparametric likelihoods pertinent to the chosen indices to obtain the profile empirical likelihood ratio function.

Our simulation results show that, when the parametric model is correctly specified, our EL estimators perform similarly to the parametric estimators. However, when the parametric model is misspecified, the EL estimators perform uniformly better than the parametric estimators and the almost nonparametric estimators.

The paper is organised as follows. The proposed empirical likelihood approach for multivariate mixture model and its theoretical properties are presented in Section 2. The extension to d-dimensional (d>3) mixtures is also presented. Simulation studies and real data analysis are provided in Section 3. Discussions are given in Section 4.

2. Methodology

We first discuss the methodology for the three-variate mixture model, and then extend to multivariate mixtures with higher dimensions.

2.1. Three-variate mixture model

Let X=(X1,X2,X3)T be a 3-dimensional random vector with distribution function H(x) and joint probability density function (3) h(x)=πi=13f1(xi)+(1π)i=13f2(xi),(3) where 0π1, and the component density functions f1 and f2 are different but unspecified. This model is a special case of model (Equation2) with m=2 and d=3.

The parameters of interest are the expectations of the random variables and the mixing proportion π. Suppose μ0 and μ1 are the expected values of the two components: μ0=xf1(x)dx,μ1=xf2(x)dx, and that they satisfy μ0<μ1. We then have the following moment estimating equations E(X1X2X3)=πμ03+(1π)μ13,E(X1X2)=E(X1X3)=E(X2X3)=πμ02+(1π)μ12,E(X1)=E(X2)=E(X3)=πμ0+(1π)μ1. There are seven estimating equations in total with three unknown parameters (π,μ0,μ1).

Let xi=(xi1,xi2,xi3)T, i=1,,n, be i.i.d. observations from the multivariate mixture model (Equation3), and pi=dH(xi). According to Owen (Citation1988), the EL function based on the observed data is (4) i=1ndH(xi)=i=1npi.(4) Let θ=(π,μ0,μ1)T. For the distribution H(x) under study, feasible pi's satisfy (5) i=1npi=1,pi0,andi=1npig(xi,θ)=0,(5) where (6) g(xi,θ)=(g1(xi,θ),g2T(xi,θ),g3T(xi,θ))T(6) with g1(xi,θ)=xi1xi2xi3πμ03(1π)μ13, g2(xi,θ)=xi1xi2πμ02(1π)μ12xi1xi3πμ02(1π)μ12xi2xi3πμ02(1π)μ12andg3(xi,θ)=xi1πμ0(1π)μ1xi2πμ0(1π)μ1xi3πμ0(1π)μ1. Inference on θ is usually made through their profile likelihood, which is obtained by maximising (Equation4) with respect to pi's subject to the constraints in (Equation5). Up to a constant not depending on θ, the resulting empirical log-likelihood is (θ)=i=1nlog{1+λTg(xi,θ)}, where λ is the Lagrange multiplier determined by 1ni=1ng(xi,θ)1+λTg(xi,θ)=0. We can show that in a O(n1/3) neighbourhood of the true values of θ, λ=λ(θ) is determined uniquely by an implicit function of θ. We denote the maximum empirical likelihood estimators as θˆ=(πˆ,μˆ0,μˆ1)T. Their asymptotic properties are given in the following theorem by Qin and Lawless (Citation1994). When θ takes its true value θ0, we write g(x,θ0) to be g(x) for short.

Theorem 2.1

Under the regularity conditions specified in Qin and Lawless (Citation1994). As n goes to infinity, n(θˆθ0)dN(0,V1), where

V1=Eg(X)θT{Eg(X)gT(X)}1Eg(X)θ1.

With (1(X1t),1(X2t),1(X3t)) in place of X, we can estimate the underlying distribution functions F1(t) and F2(t). The asymptotic normality of the resulting empirical likelihood estimators can be established in a similar way to Theorem 2.1.

2.2. Multivariate mixtures with higher dimensions

We now extend the methodology discussed in the previous section to the case with d>3. Suppose the d-variate data wi=(wi1,,wid)T, i=1,,n, arise from the mixture model with the following mixture density h(wi)=πj=1df1(wij)+(1π)j=1df2(wij). In principle, we can adopt the same approach as in the case d=3 in order to make inferences about θ. When d is large, however, the number of estimating equations we must deal with is dd+dd1++d1=2d1, which can be extremely large. Consequently, it is impractical to find the optimal solution to embrace that many estimating equations in the empirical likelihood setup.

We now propose a simple and intuitive solution to the high-dimensional problem. Let Md=d3, and Ωi (i=1,2,,Md) be all the possible samples of size 3 from {1,2,,d} drawn by simple random sampling without replacement. We randomly select K sets from {Ω1,,ΩMd} by simple random sampling without replacement. Let Ωk={sk1,sk2,sk3} (k=1,2,,K) be the resulting K index sets, and uki=(xki,yki,zki)T denote (wi,sk1,wi,sk2,wi,sk3)T. We assume sk1<sk2<sk3 for each k, and treat the data with different Ωk as independent samples. The profile empirical likelihood ratio function of θ based on the selected index sets is R(θ)=maxk=1Ki=1n(npki)i=1npki=1, pki0, i=1npkig(uki,θ)=0, k=1,,K, where the function g is defined in (Equation6).

Applying the method of constrained optimisation, we have G=k=1Ki=1nlog(npki)nk=1Ki=1npkiλkTg(uki,θ)+k=1Kγki=1npki1, where λk and γk are the Lagrange multipliers. Setting the first derivative of G with respect to pki to zero, we have Gpki=1pkinλkTg(uki,θ)+γk=0. Multiplying both sides of the above equation by pki and summing over i give i=1npkiGpki=n+γk=0, which leads to γk=n. Therefore, the maximum of k=1Ki=1n(npki) is attained at pˆki=1n11+λkTg(uki,θ),k=1,,K, where the Lagrange multipliers λk=λk(θ)'s are the solutions to 1ni=1ng(uki,θ)1+λkTg(uki,θ)=0. Putting pˆki back and taking logarithm, we have the profile empirical log-likelihood ratio function of θ, (θ)=log{R(θ)}=k=1Ki=1nlog{1+λkTg(uki,θ)}. We show that with probability tending to one, there must be a local maximum point in a very small neighbourhood of the true parameter value of θ. Let Ω={Ω1,,ΩK}.

Lemma 2.1

Let θ0=(π,μ0,μ1) be the true value of θ. Suppose |x|9dF0(x)+|x|9dF1(x)<, π(0,1) and μ0μ1, and that F0 and F1 are non-degenerate distributions. Conditioning on Ω, as n, (θ) attains its maximum value at some point θˆ with probability 1 in the interior of the ball θθ0n1/3. Let λˆ=(λˆ1T,,λˆKT)T with λˆi=λ(θˆ). Consequently, θˆ and λˆ satisfy Qkn(θˆ,λˆk)=0for k=1,,K,andQ0n(θˆ,λˆ)=0, where Qkn(θ,λ)=1ni=1ng(uki,θ)1+λkTg(uki,θ),Q0n(θ,λ)=1nk=1Ki=1n11+λkTg(uki,θ)g(uki,θ)θTλk.

Lemma 2.1 implies that the proposed EL estimator θˆ is consistent. Based on Lemma 2.1, we further establish the asymptotic normality of θˆ in the following theorem. This result is an extension of Theorem 1 in Qin and Lawless (Citation1994). It embraces the correlation structure of the selected elements within the random vectors.

Theorem 2.2

Assume the conditions of Lemma 2.1. Let S11=E{g(X)gT(X)}, S12=S21T=E{g(X)/θT}, and Σoff=1K(K1)1kjKE{g(uk1)gT(uj1)|Ω}. Conditioning on Ω, as n goes to infinity, n(θˆθ0) converges in distribution to N(0,V2), where V2=1K(S21S111S12)1+K1K(S21S111S12)1(S21S111)Σoff(S111S12)(S21S111S12)1.

If there are no common elements in Ωk and Ωj, then E{g(uk1)gT(uj1)|Ω}=0. Further, if d is quite large, and there are no common elements in any pair of Ωk and Ωj (kj), then Σoff=0, and V2=(S21S111S12)1/K. At the other extreme, if Ωk=Ω1 for k=2,,K, then Σoff=S11, and V2=(S21S111S12)1. Therefore, the second term in V2 stands for the efficiency loss due to the fact that some data are used more than once.

3. Simulation studies and data analysis

3.1. Simulation studies

We have carried out simulations to evaluate the finite-sample performance of the proposed empirical likelihood estimators (EL). For comparison, we have also considered two of its competitors: the maximum likelihood estimators (ML) under the multivariate normal mixture model, and the almost nonparametric estimators based on multinomial mixtures (Cruz-Medina et al. (Citation2004); MN for short). Both the ML and MN estimators can be calculated by the EM algorithm.

We generate data from the mixture model (Equation3). Different specifications of component distributions f1 and f2 are listed below:

  1. (Normal mixtures) f1 and f2 are the density functions of N(μ1,1) and N(μ2,1), respectively. Here μ1=0 and μ2=1 or 2.

  2. (Non-central t mixtures) f1 and f2 are the density functions of t(r,a(r)μ1) and t(r,a(r)μ2), respectively. Here t(r,a(r)μ) denotes a t-distribution with r degrees of freedom, non-centrality parameter a(r)μ, and mean μ, where a(r)=(2/r)(Γ(r/2)/Γ((r1)/2)). Here r=4, μ1=0, and μ2=1.5 or 2.

  3. (Chi-square mixtures) f1 and f2 are the density functions of χμ12 and χμ22. Here μ1=5 and μ2=10 or 20.

For each setting, we generate 1000 samples with sample size n=400, d=3 or 6, and π=0.2, 0.5, or 0.8. When d=6, we set K=8 in the proposed EL method. We calculate the biases and standard deviations of the estimators under comparison, and summarise the results in Tables .

Table 1. Biases (%) and standard deviations (%) (in parentheses) of different estimators based on 1,000 simulations with n=400. Data were generated from the multivariate mixture model with f1 and f2 being N(μ1,1) and N(μ2,1), respectively. Here μ1=0, μ2=1 or 2 and d=3 or 6.

Table 2. Biases (%) and standard deviations (%) (in parentheses) of different estimators based on 1,000 simulations with n=400. Data were generated from the multivariate mixture model with f1 and f2 being t(4,μ1) and t(4,μ2/{2Γ(3/2)/Γ(2)}), respectively. Here μ1=0, μ2=1.5 or 2 and d=3 or 6.

Table 3. Biases (%) and standard deviations (%) (in parentheses) of different estimators based on 1,000 simulations with n=400. Data were generated from the multivariate mixture model with f1 and f2 being χμ12 and χμ22, respectively. Here μ1=5, μ2=10 or 20 and d=3 or 6.

Let us first examine Table , where the multivariate normal mixture model is correctly specified. As expected, the ML estimators have the smallest standard deviations in all cases and the smallest absolute biases in most cases. The proposed EL estimators perform very similarly to the ML estimators and both of them are uniformly better than the MN estimators. As μ2 goes further away from μ1=0, all estimators have decreasing standard deviations. This may be because the two component distributions in the mixture model also get further away from each other. When π increases from 0.2 to 0.8, the performances of all the three estimators for μ1 are getting better, while those for μ2 are getting worse. This is probably because as π increases, the multivariate normal mixture contains increasing information about μ1 but decreasing information about μ2. All the three estimators for π have better performance when π lies in the middle than on the boundaries of its parameter space.

However, when data are generated from non-normal mixtures, the ML estimators lose their optimality. From Tables , we can see that compared with the MN estimators, they have smaller absolute biases in some cases, but larger standard deviations in other cases. The proposed EL estimators perform reasonably well as they have uniformly smaller biases and standard deviations than the other two competitors.

If the mixing proportion is of primary interest, we see that when the multivariate normal mixture is correctly specified, the ML estimator again performs the best and the EL estimator has almost the same reasonable performance. Both of them perform better than the MN estimator. When the model is misspecified, the EL estimator has the best performance followed by the MN estimator. These two estimators usually win the ML estimator by a large amount. For example, in Table , when π=0.5, μ2=1.5, and d=3, all three estimators for π have similar standard deviations, however, the ML estimator has a much larger absolute bias (0.3044) compared with the EL estimator (0.0056), and the MN estimator (0.0042).

When the data dimension d increases from 3 to 6, the standard deviations of both the EL and MN estimators are getting smaller but they have different performances in bias. The absolute biases of the EL estimators are always getting smaller, while those of the ML and MN estimators are not the case. For example, in Table , when π=0.2 and μ2=10, the absolute bias of the MN estimator for μ2 increases from 0.5742 to 0.6359 and that of the ML estimator for μ1 increases from 0.0366 to 0.2254. By contrast, that of the EL estimators for both (μ1,μ2) decreases from (0.0958,0.0519) to (0.0659,0.0133).

Overall, the EL method exhibits more robust performance than the MN and ML methods for different model specifications. When the normal mixture is correctly specified, the proposed EL estimators have comparable performance as the ML estimators. When the normal mixture is misspecified, the EL estimators perform uniformly better than the other two competitors.

3.2. Data analysis

Reaction time (RT) task is one of the most common experimental methods in psychology to study individual differences. In this section, we apply our proposed empirical likelihood method to a RT data set which was analysed by Cruz-Medina et al. (Citation2004). In this experiment, 197 nine-year-old children were tested on mental rotation task in which a target figure was presented on the left and another one on the right. Children thus had to determine whether the second figure was identical to the first or simply a mirror image instead. The RT was recorded in milliseconds. There were 6 trials, and we considered these trials as d=6 repeated measurements. The time delays between trials were randomly chosen so that children would unable to anticipate the length of delays. The subsequent trials were then expected or assumed to be independent. We display only the histogram of the first measurement of the data in Figure ; those for the rest are similar. Cruz-Medina et al. (Citation2004) suggested using a two-component mixture to fit the heterogeneous RT distribution.

Since recorded in milliseconds, the RT values range from around 700 to 7000. For convenience, we re-scale them in seconds; the resulting numbers are no greater than 10. Although the mixing proportion π is of primary interest, we calculate the EL, MN and ML estimators for all the three parameters π,μ1 and μ2. The results are tabulated in Table . Based on these point estimates, we also provide 95% Wald interval estimates for all the three parameters with variances estimated by 200 bootstrap repetitions.

Figure 1. Histogram of the first measurement of the RT data.

Figure 1. Histogram of the first measurement of the RT data.

Table 4. Point and interval estimates of the EL, MN and ML methods for π,μ1 and μ2. EL0: EL with K=63=20; EL1, EL2, EL3: EL with K=8; MN1: MN with cut points c1,,c10 being the deciles of the empirical distribution, which was suggested by Cruz-Medina et al. (Citation2004) for general use; MN2: MN with cut points (c1,,c10)=(0.5,1,1.2,1.4,1.6,2,2.5,3,4,5), which was used by Cruz-Medina et al. (Citation2004) when they analysed this dataset.

As mentioned in Section 2.2, the EL estimator depends on the K randomly selected sets Ωk (k=1,2,,K). Therefore, we shall obtain different EL estimates in general when applying the EL method more than one time if K<d3. We apply the EL method with K=8 three times, and denote the results by EL1, EL2 and EL3, respectively. In this example, d=6. When K=63=20, the results are denoted by EL0. We see that the EL estimates with K=8 are very close to those with K=20. This confirms that the proposed random selection strategy works very well. The EL proportion estimates are all around 0.7, and the EL estimates for μ1 and μ2 are around 1.6 and 2.9, respectively.

When applying the MN method, we need to determine the cut points ci's. For general use, Cruz-Medina et al. (Citation2004) suggested using 10 cut points and choosing c1,,c10 to be the deciles of the empirical distribution of the data. The resulting MN method, denoted by MN1, is also the MN method compared in our simulation study. When analysing the RT data, Cruz-Medina et al. (Citation2004) used (c1,,c10)=(0.5,1,1.2,1.4,1.6,2,2.5,3,4,5). We denote the resulting MN method by MN2. It seems that the MN results depend to some extent on the choice of cutting points, because the MN1 proportion estimate 0.52 is quite different from that of MN2 0.59. In the meantime, the MN2 point and interval estimates are both nearly equal to those of the ML method.

According to our simulation studies, the EL method exhibits more robust performance than the MN and ML methods. This indicates that the EL analysis results are more trustworthy than those of the other two methods.

4. Discussions

In this paper, we proposed an empirical likelihood-based estimation method for the parameters of a multivariate two-component mixture model. We discussed three-variate mixtures in detail and extended the methodology to high-dimensional mixtures by giving a permutation-like method which reduces the high-dimensional problem to a three-dimensional situation. The performance and efficiency of the method are demonstrated through a real data example as well as simulation studies. The simulation results show that the proposed method is quite efficient in comparison to both completely parametric and almost nonparametric methods in the literature. Furthermore, the proposed method can accommodate parameter estimation in high-dimensional mixtures by requiring estimation only in three dimensions.

The extension of our approach to mixtures with more than two components is valuable and interesting. Similar to the two-component mixture situation, one can use a set of moment conditions implied by the mixture model to identify and estimate mixing proportions and other component parameters. When the number of components grows, the number of unknown parameters increases. The improvement in the performance of the proposed approach in terms of better identification and higher efficiency may crucially depend on the choice of the set of moment conditions. We will consider it in future research.

Acknowledgements

The authors would like to thank the editor, the AE, and the referee for their insightful comments and suggestions. The authors would like to thank Dr Jing Qin for valuable discussions and many helpful comments.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

The research is partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grants (RGPIN-2018-05846, RGPIN-2018-05981), the National Natural Science Foundation of China (Grant Numbers 11771144, 11501354 and 11501208), and the Chinese 111 Project (B14019).

Notes on contributors

Yuejiao Fu

Yuejiao Fu is an Associate Professor of Statistics in the Department of Mathematics and Statistics at York University, Canada. She received her PhD in Statistics in 2004 from the University of Waterloo. Her research interests include mixture models, empirical likelihood, and statistical genetics.

Yukun Liu

Yukun Liu is a Professor in the School of Statistics, Faculty of Economic and Management, East China Normal University, China. He received his PhD in Statistics in 2009 from Nankai University, China. His research interests include nonparametric and semiparametric statistics based on empirical likelihood and their applications in case-control data, capture-recapture data, selection biased data, and finite mixture models.

Hsiao-Hsuan Wang

Hsiao-Hsuan Wang received her PhD in Statistics in 2010 from York University, Canada. She is now a director in Model Quantification, Enterprise Risk Management, CIBC, Canada.

Xiaogang Wang

Xiaogang Wang is a Professor in Statistics in the Department of Mathematics and Statistics of York University. He is also holding an adjunct position as a senior research fellow at the Institute of Data Science of Tsinghua University in Beijing. He received his PhD in Statistics from the University of British Columbia in 2001. His current research is on statistical analysis of complex data in health and life sciences.

References

  • Cruz-Medina, I. R., Hettmansperger, T. P., & Thomas, H. (2004). Semiparametric mixture models and repeated measures: The multinomial cut point model. Journal of the Royal Statistical Society: Series C (Applied Statistics), 53, 463–474. doi: 10.1111/j.1467-9876.2004.05203.x
  • Hall, P., & Zhou, X. H. (2003). Nonparametric estimation of component distributions in a multivariate mixture. The Annals of Statistics, 31, 201–224. doi: 10.1214/aos/1046294462
  • Hettmansperger, T. P., & Thomas, H. (2000). Almost nonparametric inference for repeated measures in mixture models. Journal of the Royal Statistical Society. Series B, 62, 811–825. doi: 10.1111/1467-9868.00266
  • Kasahara, H., & Shimotsu, K. (2014). Nonparametric identification and estimation of the number of components in multivariate mixtures. Journal of the Royal Statistical Society. Series B, 76(1), 97–111. doi: 10.1111/rssb.12022
  • Lindsay, B. G. (1995). Mixture models: Theory, geometry and applications. Hayward: Institute for Mathematical Statistics.
  • McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York: Wiley.
  • Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75, 237–249. doi: 10.1093/biomet/75.2.237
  • Owen, A. B. (2001). Empirical likelihood. New York: Chapman & Hall/CRC.
  • Qin, J., & Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics, 22, 300–325. doi: 10.1214/aos/1176325370
  • Thomas, H., & Horton, J. J. (1997). Competency criteria and the class inclusion task: Modeling judgments and justifications. Developmental Psychology, 33, 1060–1073. doi: 10.1037/0012-1649.33.6.1060
  • Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. New York: Wiley.

Appendix

Since both Lemma 2.1 and Theorem 2.2 are established conditionally on the K selected sets Ωk (k=1,2,,K), for convenience we regard the K selected sets as fixed sets throughout the proofs. Note that uki's are i.i.d. random vectors for fixed k and varying i, while they are not independent for fixed i and varying k.

Proof

Proof of Lemma 2.1

We consider θ{θ|θθ0=n1/3}, which can be rewritten as θ=θ0+n1/3v with v=1. From Qin and Lawless (Citation1994), we can show that λk=O(n1/3) and λk(θ)=1ni=1ng(uki,θ)gT(uki,θ)11ni=1ng(uki,θ)+o(n1/3)(a.s.) uniformly about θ{θ|θθ0n1/3}, for each k=1,,K. By Taylor's expansion, we have

(θ)=k=1Ki=1nlog{1+λTg(uki,θ)}=n2k=1K1ni=1ng(uki,θ)T1ni=1ng(uki,θ)gT(uki,θ)1×1ni=1ng(uki,θ)+o(n1/3)(a.s.)=n2k=1K1ni=1ng(uki,θ0)+1ni=1ng(uki,θ0)θvn1/3T×1ni=1ng(uki,θ)gT(uki,θ)1×1ni=1ng(uki,θ0)+1ni=1ng(uki,θ0)θvn1/3+o(n1/3)(a.s.)=nK2O(n1/2(loglogn)1/2)+Eg(u,θ0)θvn1/3T×E(g(u,θ0)gT(u,θ0))1×O(n1/2(loglogn)1/2)+Eg(u,θ0)θvn1/3+o(n1/3)(a.s.)(c/2)n1/3,(a.s.),where c is the smallest eigenvalue of Eg(u,θ0)θTE(g(u,θ0)gT(u,θ0))1Eg(u,θ0)θ. Similarly, (θ0)=n2k=1K1ni=1ng(uki,θ0)T×1ni=1ng(uki,θ0)gT(uki,θ0)1×1ni=1ng(uki,θ0)+o(1)(a.s.)=O(loglogn).(a.s.) Since (θ) is a continuous function of θ when θ belongs to the ball θθ0n1/3, as n is large, (θ) must have a maximum point θˆ in the interior of this ball such that

(θ)θθ=θˆ=k=1Ki=1n(λkT(θ)/θ)g(uki,θ)+(g(uki,θ)/θ)Tλk(θ)1+λkT(θ)g(uki,θ)θ=θˆ=k=1Ki=1n11+λkT(θ)g(uki,θ)g(uki,θ)θTλk(θ)θ=θˆ=0.

Proof

Proof of Theorem 2.2

Taking derivatives about θ and λT, we have Qkn(θ,0)θ=1ni=1ng(uki,θ)θ,Qkn(θ,0)λjT=1ni=1ng(uki,θ)gT(uji,θ)δkj,Q0n(θ,0)θ=0,Q0n(θ,0)λkT=1ni=1ng(uki,θ)θT,

for k,j=1,,K, and δkj is the Kronecker delta. Expanding Qkn(θˆ,λˆ) and Q0n(θˆ,λˆ) at (θ0,0), we have 0=Qkn(θˆ,λˆk)=Qkn(θ0,0)+Qkn(θ0,0)λkT(λˆk0)+Qkn(θ0,0)θ(θˆθ0)+op(δn),0=Q0n(θˆ,λˆ)=Q0n(θ0,0)+k=1KQ0n(θ0,0)λkT(λˆk0)+Q0n(θ0,0)θ(θˆθ0)+op(δn),

where δn=θˆθ0+k=1Kλˆk.

It follows from the above equations that (A1) λˆθˆθ0=Sn1Dn0+op(δn).(A1) Here Dn=Q1n(θ0,0)QKn(θ0,0),Sn=S11nS12nS21nS22n, where S11n=Qkn(θ0,0)λjT1j, kK=diag1ni=1ng(u1i,θ0)gT(u1i,θ0),,1ni=1ng(uKi,θ0)gT(uKi,θ0), S12n=Q1n(θ0,0)θ,,QKn(θ0,0)θT=1ni=1ng(u1i,θ0)θ,,1ni=1ng(uKi,θ0)θ,S21n=S12nT and S22n=Q0n(θ0,0)/θ=0.

Define S11=IKS11 and S12=1KS12, where ⊗ is the Kronecker product operator. Under the conditions of Theorem 2.2, as n, it can be verified that S11n=S11+op(1),S12n=S12+op(1), and therefore Sn=S+op(1), where S=S11S12S21S22=IKS111KS121KTS12T0. In addition, nDn converges in distribution to N(0,Σ), where Σ=E{g(uk1)gT(uj1)|Ω}1k,jK. Therefore, δn=Op(n1/2). Since the inverse of S is S1=S111+S111S12S22.11S21S111S111S12S22.11S22.11S21S111S22.11, where S22.1=S21S111S12, we further have n(θˆθ0)=S22.11S21S111nDn, which converges in distribution to N(0,V2) with (A2) V2=S22.11S21S111ΣS111S12S22.11.(A2) With some algebra, it can be seen that S22.1=KS21S111S12 and S21S111=1KT(S21S111), which implies V2=K2(S21S111S12)11KT(S21S111)Σ1K(S111S12)(S21S111S12)1=K2(S21S111S12)1(S21S111)k,j=1KE{g(uk1)gT(uj1)|Ω}(S111S12)(S21S111S12)1=K1(S21S111S12)1+K1K(S21S111S12)1(S21S111)Σoff(S111S12)(S21S111S12)1. This finishes the proof of Theorem 2.2.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.