1,923
Views
1
CrossRef citations to date
0
Altmetric
Research Articles

A Semiparametric Approach for Structural Equation Modeling with Ordinal Data

ABSTRACT

There is currently a lack of methods for non-linear structural equation modeling (NSEM) for non-parametric relationships between latent variables when data are ordinal. To this end, a semiparametric approach for flexible NSEMs without parametric forms is developed for ordinal data. An indirect application of a finite mixture of structural equation models (SEMM) is employed for modeling the conditional expected mean of endogenous latent variables. In this context, the latent classes are not to be interpreted as groups of observations belonging to those classes, rather they serve as means to model flexible non-linear functions as locally linear functions which together approximate a globally non-linear function. The proposed method is based on a hybrid of direct maximization and expectation-maximization algorithms. Two simulation studies are performed which show that parameter estimates are associated with low bias and a non-linear functional form is satisfactorily estimated using the proposed approach.

Introduction

In recent decades, structural equation modeling (SEM) has become available to applied researchers, particularly in social sciences, thanks to software packages such as LISREL (Jöreskog & Sörbom, Citation1993, Citation1996) and Mplus (Muthén & Muthén, Citation2020) by modeling the covariance structure assuming normally distributed latent variables. Most SEM approaches have historically focused on linear relationships between latent variables. In this paper, we will refer the exogenous variables as the variables that are not determined by other variables in the model, whereas endogenous variables as the variables determined by other variables in the model. However, it is frequently necessary to incorporate more flexible functions in order to capture the true relationships. See Ajzen and Madden (Citation1986), Ajzen (Citation1991), and Agustin and Singh (Citation2005) for examples of such considerations. When the structural relationship is nonlinear, the latent endogenous variables are not normally distributed. Hence, the parameters cannot be determined only based on the covariance structure. Subsequently, new approaches were developed starting with Kenny and Judd (Citation1984), where products of indicators were used as indicators corresponding to product and interaction terms. This approach was further developed by Jaccard and Wan (Citation1995), Jöreskog and Yang (Citation1996), Kelava and Brandt (Citation2009), and Wall and Amemiya (Citation2001). By forming products of indicators for interaction and quadratic terms, estimation of nonlinear structural models was made possible in (at the time) existing software. The product indicator approaches have been criticized for their ad-hoc nature and arbitrariness in terms of which indicators to include (e.g.,Wall (Citation2009) and Harring et al. (Citation2012)). Since then, the methodological development for estimating nonlinear structural equation models (NSEM) has continued by maximum likelihood-based approaches, among which Klein and Moosbrugger (Citation2000) and Klein and Muthén (Citation2007) are the most established methods. Bayesian approaches of Arminger and Muthén (Citation1998) and Zhu and Lee (Citation1999), and two-stage least square approaches of Bollen (Citation1995, Citation1996) have also been considered. Most of the development for NSEM has been focused on models with continuous indicators, although there are exceptions such as Lee and Zhu (Citation2000), Lee and Song (Citation2003), Rizopoulos and Moustaki (Citation2008), and Muthén (Citation2001).

In all methods mentioned above, the functional form of the structural model is specified. However, it is sometimes desirable to relax this requirement in order to model a more general family of functions. Nonparametric approaches, such as Song et al. (Citation2014), and semiparametric approaches, e.g., Bauer (Citation2005), Pek et al. (Citation2011), Song and Lu (Citation2010), Finch (Citation2015), and Guo et al. (Citation2012) provide the opportunity to estimate models with unspecified functional forms. In the finite mixture of structural equation models (SEMMs), e.g., Bauer (Citation2005) and Pek et al. (Citation2011), latent exogenous variables are modeled as mixtures of multivariate normal distributions. The mixture-based semiparametric approaches have two directions of purposes. In the direct approach, the groups are interpreted as real subpopulations and, hence, group-specific structural parameters are interpreted as parameters associated with the corresponding subpopulation. On the other hand, in the indirect approach, the groups are not to be interpreted as subpopulations in any substantive sense. Instead, they provide means to model nonnormal latent variables as mixtures of normal distributions as in the nonlinear structural equation mixture model (NSEMM) by Kelava et al. (Citation2014). Hence, model estimation, robust to nonnormal exogenous latent variables, is possible in a parametric context. Another application of the indirect approach, as in the SEMM by Bauer (Citation2005) and Pek et al. (Citation2011), is to model a nonlinear relationship as weighted group-specific linear functions. In this approach, both nonlinearity and nonnormality in the latent variables are modeled by the normal mixture, resulting in the flexible model estimation, robust to nonnormal latent exogenous variables.

There is, at present, a lack of non- and semiparametric approaches for latent variable modeling in the presence of ordinal data, although there are a few exceptions, e.g., Song et al. (Citation2013) who extended the penalized Bayesian P-splines approach of Song and Lu (Citation2010) to incorporate mixed data types. It has been demonstrated that treating ordinal data as continuous can produce bias (Finney & DiStefano, Citation2006; Rhemtulla et al., Citation2012). In this article, a semiparametric approach, based on the work of Bauer (Citation2005) and Pek et al. (Citation2011) is developed to allow for ordinal indicators based on quasi-maximum likelihood estimation, combining the direct maximization of the approximated log-likelihood of Jin et al. (Citation2020) and the expectation-maximization (EM) algorithm of Dempster et al. (Citation1977).

The article is organized as follows. First, the model is outlined, followed by the estimation procedure. Two simulation studies are reported in the following section. The first study recognizes that accurate, group-specific parameters are obtained, whereas the second study shows that a structural model of quadratic form is satisfactorily estimated using the developed approach. The article concludes with discussion and conclusion. An appendix is provided with technical details.

The semiparametric model

In the SEMM, as proposed by Bauer (Citation2005) and Pek et al. (Citation2011), a weighted sum of linear components is used to model the conditional expected value E(η|ξ) as

(1) Eη|ξ=m=1MPm|ξτ(m)+Γ(m)ξ,(1)

where ξ is a Kξ-dimensional vector of latent exogenous variables, η is a K\etaa-dimensional vector of latent endogenous variables, M is the number of components in the mixture, τ(m) is a K\etaa-vector of intercepts for component m, Γ(m) is a K\etaa×Kξ-matrix representing linear effects of component m, and the conditional probability of component m is

(2) Pm|ξ=π(m)NKξξ;\mua(m),Φ(m)l=1Mπ(l)NKξξ;\mua(l),Φ(l),(2)

where NKξ() is the multivariate normal density function, π(m) is the population proportion of component m, μ(m) is the population mean vector of dimension Kξ×1 corresponding to component m and Φ(m) is the population covariance matrix of ξ corresponding to component m of dimension Kξ×Kξ.

Only continuous data are considered by Bauer (Citation2005) and Pek et al. (Citation2011). In this work, their approach is extended to work for ordinal data. Define Xj to be the jth ordinal indicator of ξ and Yj to be the jth ordinal indicator of η.Footnote1 The number of indicators corresponding to ξ and η is denoted by ρx and ρy, respectively, such that x and y are the ρx and ρy dimensional realizations of Xj, for j=1,2,3,ρx and Yj for j=1,2,,ρy, respectively. The measurement model relating the continuous latent variables ξ and η to the ordinal observed variables x and y is then

(3) gj(x)PXjuj|ξ=αx,j(uj)βx,jTξ,uj1,,Uj,j=1,2,,ρx,(3)
(4) gj(y)PYjvj|η=αy,j(vj)βy,jTη,vj1,,Vj,j=1,2,,ρy,(4)

where gj(x) and gj(y) are the link functions, βx,jT is the ρx-vector of factor loadings of indicator Xj, βy,jT is the ρy-vector of factor loadings of indicator Yj, and Uj and Vj are the number of categories of corresponding indicator. Here we let αx,j(Uj)= and αy,j(Vj)=.

Maximum likelihood estimation

The joint density function of y, x, η, ξ and m is referred to as fy,x,η,ξ,m. We assume that the observed variables y and x are independent, conditioned on the latent variables η and ξ and component m. The distribution of y is determined by η and the distribution of x is determined by ξ from the measurement model (EquationEquations (3) and (Equation4)). The measurement model does not depend on the group m. The SEMM (1) indicates that the distribution of η is determined by ξ and component m and that the distribution of ξ is determined by the component m. Thus, the joint density function is equivalent to

(5) fy,x,η,ξ,m=fy|ηfx|ξfη|ξ,mfξ|mfm.(5)

where fy|η and fx|ξ are given by the measurement model (3) and (4),

(6) ξ|mNξ;\mua(m),Φ(m)(6)
(7) fm=π(m),m=1,2,,M,(7)

with the restriction of total probability m=1Mπ(m)=1. The conditional distribution of η given ξ and component m is assumed to be

(8) η|ξ,mNη;τ(m)+Γ(m)ξ,Ψ.(8)

In the current setting Ψ is the same for all M components for simplicity. This restriction could easily be lifted. Neither the latent variables ξi and ηi, nor the components mi, are observed, where subscript i refers to observation i. Hence, under the assumption of independent observations, the objective function for maximization is the observed log-likelihood

(9) θ=i=1nlogmi=1Mfyi,xi,Υi,midΥi,(9)

where θ is the vector of unrestricted parameters, n is the number of observations and ΥiT=ηiT,ξiT, i.e., the vector of latent variables of the ith observation. Similarly, yi and xi are vectors of observed indicators of the ith observation, and mi represents the mith component.

Estimation

The aim is to maximize (θ) with respect to θ. In situations with missing data (or latent variables), it is common to employ the EM algorithm (Dempster et al., Citation1977). However, the EM algorithm is known for its slow convergence. Jin et al. (Citation2020) proposed to estimate an extended NSEM by a quasi-Newton method where the exact gradient of the approximated log likelihood is calculated for finding the maximum of (θ) without using EM, hence, referred to as direct maximization (DM). For the extended NSEM, we expect DM to be computationally more efficient than EM. Instead of purely relying on the EM algorithm, we propose to use a hybrid of EM and DM in Jin et al. (Citation2020) in order to estimate the SEMM. The developed version of the EM algorithm is expected to be reasonably computationally efficient since there are closed-form solutions to the parameter updates in one EM step. For the purpose of presentation, a sketch of the proposed algorithm is described here. A detailed description of the procedure can be found in the appendix.

Let U(max) and V(max) be the maximum number of categories of the indicators corresponding to ξ and η, respectively. Then let αx be a ρx×U(max)1-matrix with element αx,j(uj) on row j and column uj, \alphaa y be a ρy×V(max)1-matrix with element \alphaay,j(vj) on row j and column vj. Let βx be a ρx×Kξ-matrix with the row vector \betaax,jT on row j and \betaay be a ρy×Kη-matrix with βy,jT on row j. For convenience, the vector of unconstrained parameters is partitioned as θT=θ1T,θ2T, where θ1 is a vector of unrestricted elements in αy, βy, αx and βx and θ2 is a vector of unrestricted parameters in Ψ, \tauu(1), …, \tauu(M), Γ(1), Γ(M), Φ(1), …, Φ(M), \mua(1), …, \mua(M), π(1), …, π(M). For a given θ2, θ1 only consists of group invariant parameters in the measurement model. We propose to maximize θ1,θ2 with respect to θ1 for fixed θ2 by DM. No closed-form updates of θ1 are found. Hence, the quasi-Newton’s method is used to update θ1 numerically. For a given θ1, θ2 is updated using the EM algorithm. EM is used for some parameters since the logarithm of the sum over the components mi in (9) generates instability in the gradient of (θ). Further, in the proposed version of EM, the conditional maximization has closed-form solutions.

Both the closed-form updates in EM and the θ1,θ2 to be maximized in DM involve intractable integrals and approximations are required. In DM, the log-likelihood θ1,θ2 is approximated by the adaptive Gauss-Hermite quadrature (AGHQ, Liu & Pierce, Citation1994). The reader is directed to the appendix for the expression of an AGHQ approximation. For an unimodal function, the error rate of the AGHQ approximation is Oρx+ρy(q+2)/3 (Jin & Andersson, Citation2020) where takes the largest integer that is less than the enclosed value and q is the number of quadrature points per latent variable. The AGHQ approximation was accurate for the extended NSEM investigated by Jin et al. (Citation2020) for q5.

The exact gradient of the approximated log-likelihood is calculated in the quasi-Newton’s method, which is similar to the method of Jin et al. (Citation2020). In EM, the integrals in the closed-form solutions are also approximated by the AGHQ and are expected to be accurate for large samples and sufficiently large q. See the appendix for details. A pseudocode of the proposed approach is presented in Algorithm 1 at iteration k.

Algorithm 1 Proposed approach: Hybrid of EM and DM

while convergence criterion not met do

(a) DM Step: Conditional on θ2(k), update θ1 by maximizing the approximated θ1,θ2(k) using a quasi-Newton’s method

(b) EM step: Conditional on θ1(k+1), update θ2 with approximated closed-form solutions

end

Restrictions

In the SEM framework, a common assumption is that EΥ=0. In the proposed semiparametric setting, it is more convenient to put restrictions in the measurement model as explained below. It is, however, whenever appropriate, possible to perform a change of variables such that the new variables, Υ=ΥEΥ, where EΥ is a function of model parameters. The transformed variables satisfy EΥ=0. The common assumption that Eξ=0, corresponds to Kξ restriction equations for proportions and means of exogenous latent variables. The corresponding maximization equations in the EM step then lack closed-form solutions. In the current work, we define Kξ restrictions on αx instead. Impose the restrictions \alphaax,s(1)=0 where s is an index corresponding to a row of αx and on the same row in βx there is a restricted element setting the scale of a latent variable. There is one such restriction per latent exogenous variable giving rise to Kξ restrictions. Consider a change of variables for \xia=\xia\muaξ, where μξ=E\xia=m=1Mπ(m)\mua(m)0. In the translated variables we have E\xia=0. Then, the measurement model, e.g., corresponding to the indicator xs, is

gs(x)PXs1|ξ=αx,s(1)k=1Kξβx,sTξ.

where \alphaax,s(1)=0 and ξ=\xia+μ\xia. Similarly,

\mua\etaa=Eη=m=1Mπ(m)\tauu(m)+Γ(m)\mua(m)0,

in general. Correspondingly K\etaa number of restrictions for η are also needed. For this reason, the restriction τ(1)=0 is imposed. Then, the change of variables η=ημη gives Eη=0. Thus, it is possible and straightforward to alternate between the sets of variables ηT,\xiaT, for which the estimation procedure is designed, and ηT,ξT which has mean zero. For example, the population latent regression in Simulation Study 2 in the next section is η=0.5+0.5ξ+0.5ξ2+ζ, which is based on the zero-mean parametrization, namely, E(ξ)=E(\etaa)=0. The linear predictor for the first indicator is \alphaax,1(1)β1(x)ξ, where \alphaax,1(1)0, when generating the ordinal values. During estimation in our parametrization, we restrict \alphaax,1(1)=0 and \tauu(1)=0. The linear predictor for the first indicator becomes \betaa1(x)ξ and the latent regression in the first group is Γ(1)ξ. Consequently, neither E(\etaa) nor E(ξ) is necessarily zero. Instead, they are μξ=m=1Mπ(m)\mua(m) and \mua\etaa=m=1Mπ(m)\tauu(m)+Γ(m)\mua(m), respectively. We can easily transform the estimates from our parametrization to the zero-mean parametrization using ξ=ξμξ and \etaa=\etaa\mua\etaa . In particular, the \betaa and Γ(m), for all m, remain the same. The measurement models in two parametrizations can be transformed by

\alphaax,j(uj)\betaax,jξ=\alphaax,j(uj)\betaax,j\muaξ\betaax,jξ,
\alphaay,j(vj)\betaay,j\etaa=\alphaay,j(vj)\betaay,j\mua\etaa\betaay,j\etaa.

For the latent regression, our parametrization yields E\etaa|ξ,m=\tauu(m)+Γ(m)ξ, which indicates that

E(*|ξ*,m)=(+(m))+Γ(m)ξ.

Hence, the conditional mean in the zero-mean parametrization can be obtained by

E\etaa|ξ=m=1MPm|ξ\tauu(m)+Γ(m)\muaξ\mua\etaa,

where Pm|ξ=Pm|ξ.

Simulations

In order to investigate the performance of the proposed method, two simulation studies are performed. In the first study, data is generated using prespecified, true model parameters, and the accuracy of the parameter estimation is investigated. In the second study, the data is generated using a quadratic function. Hence, no true parameters are known (or even existing). The degree to which the procedure captures the true nonlinear function is investigated, without having access to true parameters. The code is written in R (R Core Team, Citation2018) and will be provided upon request and will be released as an R package. In the simulations, the estimation stops when the maximum absolute difference in the parameters between two consecutive iterations is less than 5104. In each simulation study, 1000 samples are generated and estimated using 5 quadrature points. For evaluation purposes in Simulation Study 1, the relative bias RBp=100R1i=1R\thetaap1(\thetaaˆp,i\thetaap), root mean squared error RMSEp=R1i=1R\thetaaˆp,i\thetaap2, and standard deviation SDp=R1i=1R\thetaaˆp,i\thetaaˉp2 are calculated for each structural parameter θp, where R is the number of replications and \thetaaˆp,i is the corresponding estimate at the ith replication, and \thetaaˉp is the average estimate of the parameter \thetaap.

Simulation study 1

Simulation design

In the first simulation study, data is generated using two bivariate normally distributed components with population proportions π(1)=π(2)=0.5 of the two latent exogenous variables (ξ1 and ξ2). Within each component, the variances are 0.5 and covariances 0.2. In this setting, the component means (\mua(1) and \mua(2)) are not free parameters, rather functions of other parmeters, as explained in the previous section. One latent endogenous variable \etaa is generated conditioned on ξ1 and ξ2 from a normal distribution with conditional mean given by (1) and (2), with an error term with variance Ψ11=1.0. The generated latent variables are used to generate the ordinal indicators yi and xi from a measurement model given by (3) and (4) with the probit link function. For setting the location of ξ, \alphaax,1(1)=\alphaax,4(1)=0. Similarly, restrict the local intercept \tauu1(1)=0, for setting the location on \etaa. Furthermore, the zero elements in βx are restricted to zero, which gives a simple structure.Footnote2 In order to set the scale of the latent variables \betaax,11=\betaax,42=\betaay,11=2. The model parameters of the measurement model are then given by

αxT=00.040.0800.040.073.753.372.993.753.372.994.964.473.994.964.473.996.385.795.216.385.805.22βxT=21.801.6000000021.801.60αyT=  2.832.582.32  0.870.790.71  1.781.611.45  4.253.863.48βyT=21.801.60.

The remaining, free model parameters are the local, linear parameters Γ(1)=1.0,1.0, Γ(2)=1.0,1.0 and the local intercept τ1(2)=4.0. The parameters are such that the R2 of the structural model is approximately 0.69. Reliabilities are between 0.79 and 0.86 for x and between 0.86 and 0.91 for y. Furthermore, the marginal probabilities of the respective categories are 0.15, 0.45, 0.15, 0.15, 0.10 for all items in x, and 0.20,0.20, 0.30, 0.20, 0.10 for all items in y. The sample sizes investigated are n={500,1000}.

Simulation results

Two (out of 1000) replications failed to converge for n=500 and all samples converged for n=1000. In line with the literature (Flora & Curran, Citation2004), relative biases larger than 5 are considered to be moderate to substantial. In (n=1000) and (n=500) the population parameter values (\thetaap), average estimates (\thetaaˉp), RB, 10SD and 10RMSE are reported for the parameters associated with the structural, semiparametric model. The proposed semiparametric method provides low biases for all parameters. The RMSE is almost exclusively constituted by the variance of the estimates.

Table 1. Parameter and corresponding average estimates using AGHQ approximation with 5 quadrature points per latent dimension and 1000 replications. RB, SD, and RMSE stand for relative bias, standard deviation, and root mean squared error, respectively. The sample size is N=1000

Table 2. Parameter and corresponding average estimates using AGHQ approximation with 5 quadrature points per latent dimension and 1000 replications. RB, SD, and RMSE stand for relative bias, standard deviation, and root mean squared error, respectively. The sample size is n=500

Simulation study 2

Simulation design

In the second simulation study, data (n=1000) were generated using the same quadratic function as Bauer (Citation2005), namely

(10) \etaa=0.5+0.5\xia+0.5\xia2+ζ,(10)

where \xia is generated from a standard normal distribution, the error term ζ is generated from a normal distribution with mean 0 and variance 0.25. The stars are used because E(\etaa)=E(ξ)=0. Hence, after estimation, the estimated function is shifted as described before. The parameters of the measurement model are given by

αxT=00.200.102.972.632.393.853.593.225.244.844.19βxT=21.801.60αyT=1.701.581.420.670.580.460.890.750.862.882.822.45βyT=21.801.60,

where \alphaax,11=0 and \betaay,11=\betaax,11=2 are imposed restrictions. This gives an R2 of the structural model of approximately 0.71 and reliabilities between 0.73 and 0.81 for all indicators. The marginal probabilities in respective categories are the same as in those of Simulation Study 1. The sample size is 1000.

Simulation results

The conditional expected mean E\etaa|ξ is shown as a function of ξ in when 2 and 3 components are used (left and middle panels), respectively. The bold line represents the population function in Equationequation (10), whereas the thin line represents the average estimated conditional mean function. Ninety-five percent of the estimated functions are within the shaded areas. With two components the functional form is captured to some degree, although the performance is substantially better using three components. Although the average result is acceptable, in some instances, the form of the conditional expected mean is jumping peculiarly when one of the proportions is estimated to be close to zero. In practice, it is difficult to know how many components to use. One possibility is to use the Akaike information criterion (AIC) for selecting the number of components. The AIC is defined as

AIC=2k2θˆ,

Figure 1. The conditional mean of the true data generating process (thick line) and the average estimated conditional mean (thin line) with empirical 95% intervals (filled blue areas) for 2 components (left), 3 components (middle) and when AIC is used for selecting the number of components (right)

Figure 1. The conditional mean of the true data generating process (thick line) and the average estimated conditional mean (thin line) with empirical 95% intervals (filled blue areas) for 2 components (left), 3 components (middle) and when AIC is used for selecting the number of components (right)

where k is the number of estimated parameters and \thetaaˆ is the approximated log likelihood at the mode. In the simulations, 3 components are recommended in approximately 84% of the replications. In the right panel of , each replication has been estimated using 2 and 3 components and the model with the lowest AIC has been selected. The average estimated conditional mean function (the thin line) captures the population function well.

Discussion and conclusion

In NSEM it is common to consider terms corresponding to interaction and quadratic effects for modeling nonlinearities in the structural model. It is sometimes appropriate to consider more flexible functions with less, or no a priori assumptions about the functional form. Such models are less common in the literature, particularly models which consider ordinal data. In this article, a semiparametric structural model (Bauer, Citation2005; Pek et al., Citation2011) is proposed in the presence of ordinal data. The structural model consists of a weighted sum of locally linear functions, which together approximate a globally nonlinear function. A hybrid of a direct maximization approach (Jin et al., Citation2020) and an expectation-maximization algorithm (Dempster et al., Citation1977) is suggested and implemented for maximizing the marginal log-likelihood of the observed ordinal data. Two simulation studies are performed. The first shows that the proposed method provides parameter estimates with low relative bias, whereas the second study illustrates how the method accurately captures a nonlinear structural model in terms of the conditional mean of the latent endogenous variable. The accurate parameter estimates and performance in estimating the conditional mean suggest that the AGHQ approximations are of high quality for q5 in models similar to those investigated in this study. Two and three components are used in the second study. Using two components yields a reasonable and stable functional form, whereas three components yield a more accurate functional form on average, with the drawback of occasionally unstable estimates. The number of components might possibly be selected using AIC. Advantages of the proposed method include the capacity of modeling flexible structural models in the presence of ordinal data when the latent exogenous variables are not necessarily normal. In this study, little attention is devoted to the latter point. A suggestion for future studies is to investigate the performance of the proposed method under moderate to severe nonnormality of latent exogenous variables. This is left as a future project.

Correction Statement

This article has been republished with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

The research reported in this article has been supported by Vetenskapsrådet (Swedish Research Council) under the program “New Statistical Methods for Latent Variable Models, 2017-01175”.

Notes

1 The subscripts on X and Y should in principle be different, e.g., jx and jy but for notational convenience, we use the same (j) for X and Y.

2 A simple structure is not necessary, but in this first study, a simple structure is employed for simplicity.

References

Appendix A

Integral approximation

The integrals in Equationequation (9) are intractable. In this work, such integrals are approximated using the AGHQ approximation. The Gauss-Hermite quadrature approximation of the K-dimensional integral expgtdt is a weighed sum of function values as

(11) k1=1qkK=1qwk1k2kKexpgzk1,,zkK,(11)

where

(12) wk1k2kK=j=1Kwkjexpzkj2,(12)

where zkj and wkj are the Gauss-Hermite quadrature points and weights, respectively. q is the number of quadrature points per dimension. In adaptive Gauss-Hermite quadrature approximation, proposed by Liu and Pierce (Citation1994), the quadrature points are translated and dilated to improve the approximation of unimodel functions gt.

Appendix B

Maximum Likelihood Estimation

Define

(13) hi,miΥi;θ=logfyi,xi,Υi,mi;θ.(13)

Then, (5) yields

hi,miΥi;θ=logfyi|ηi+logfxi|\xiai+logfηi|ξi,mi+f\xiai|mi+logfmi.

EquationEquations (3), (Equation4), (Equation6), (Equation7), (Equation8) and (Equation13) give

hi,miΥi;θ=j=1ρxlogPXij=uij|ξi+j=1ρxPYij=vij|ηi
Kη2log2π12log|Ψ|12ηiτ(mi)\Gammaa(mi)ξiTΨ1ηiτ(mi)\Gammaa(mi)ξi
Kξ2log2π12log|Φ(mi)|12μ(mi)ξiTΦ(mi)1\mua(mi)ξi+logπ(mi).

Measurement Model Estimation

Since the sum in (9) does not depend on the parameters in θ1, direct maximization, as proposed by Jin et al. (Citation2020) is implemented for θ1 given the current estimates of θ2. The observed likelihood is

(14) Lθ=i=1nLiθ=i=1nmi=1Mexphi,miΥi;θdΥi.(14)

where Li is the observed likelihood of the ith observation. Let the mode Υˆi,m be the solution to

hi,miΥi=0,

and the Hessian be

Hi,mi=2hi,miΥiΥiT,

evaluated at the mode. Further, let Li,miLi,miT=Hi,miΥˆi,mi;θ1. Applying the AGHQ approximation using Equationequations (11) and (Equation12) to the ith factor of Equationequation (14), gives

Liθmi=1MexpK2log212logHi,miΥˆi,mi;θ×
(15) k1=1qkK=1qwk1,k2,,kKexphiΥ˜i,mi(zk1,zk2,,zkK);θ.(15)

where Υ˜i,mi(k1,k2,,kK)=2Li,mizk1,k2,,kK+Υˆi,mi are the translated and dilated latent variables about the mode. And with iθ=logLiθ the corresponding observed log-likelihood is

θ=i=1niθi=1nlogmi=1Mei,m(aghq)θ,Υˆi,mi,

where i,m(aghq)θ,Υˆi,mi is the log of the mith term in Equationequation (15). θ1 is updated by solving

(16) θ1=0(16)

with respect to θ1, using the approximation in Equationequation (15). Then Equationequation (16) is

θ1i=1nmi=1Mi,mi(aghq)(θ,Υˆi,mi)θ1expi,mi(aghq)(Υˆi,mi;θ)mi=1Mexpi,mi(aghq)(Υˆi,mi;θ).

Acknowledging that the mode Υˆi,mi depends on θ, the gradient of i,mθ,Υˆi,m(θ) is

(17) i,miθ,Υˆi,mi(θ)θ1=i,mi(θ)θ1|Υ=Υˆi,mi+Υˆi,mi(θ)θTTi,mi(Υ)Υ|Υ=Υˆi,mi,(17)

where i,mi(θ)/θ1 is the derivative of i,mi with respect to θ1, treating Υˆi,mi as constant, whereas i,mi(Υ)/Υ is the derivative of i,mi with respect to the latent variables, treating θ as fixed. The first factor in the second term of (17) is, by the implicit function theorem

Υ^i,m(θ)θ1T=(2hi(Υ;θ)ΥΥT|Υi,m=Υ^i,m)12hi,m(Υ;θ)Υθ1T|Υi,m=Υ^i,m.

In order to find the solution to Equationequation (16) the BFGS approximation to the Newton–Raphson method is implemented. The inverse of Hi,mi is approximated and updated by low-rank approximation at each iteration. The updated parameter vector θ1 is obtained by updating θ1 with the BFGS algorithm a number of times.

Structural Model Estimation

If the quasi-Newton-Raphson method proposed in the previous section would be used, the sum in Equationequation (14) introduces instability for parameters, which depend on mi. All elements in θ2 depend on mi except for Ψ. Ψ is the same for all mi in this setting only for simplicity. The reason for keeping Ψ in θ2 is for convenience at later stages in more general settings. The EM algorithm of Dempster et al. (Citation1977) is implemented for updating θ2 given the current estimates of θ1, as follows:

Treating Υi and mi as random variables, let the conditional distribution of Υi and mi be

(18) fiΥi,mi|yi,xi;θ1,θ2(old)=fiyi,xi,Υi,mi;θ1,θ2(old)m=1Mfiyi,xi,Υi,mi;θ1,θ2(old)dΥi(18)

at the current iteration given the parameter values obtained in the previous iteration, θ2(old). The update of θ2 is the vector that maximizes the expected value of the complete log-likelihood given the conditional distribution in Equationequation (18). Throughout this section, θ1 is fixed to the current estimate. Hence, for simplicity, θ1 is dropped. Then, the objective function is

(19) Qθ2,θ2(old)=i=1nQiθ2,θ2(old)(19)
=i=1nmi=1Mhi,miΥi;θ2fiyi,xi,Υi,mi;θ2(old)dΥimi=1Mfiyi,xi,Υi,mi;θ2(old)dΥi.

The maximum of (19) is attained at θ2(upd) and can be expressed in terms of the following integrals:

fiyi,xi,Υi,m;θ2(old)dΥiρi,m
ξifiyi,xi,Υi,m;θ2(old)dΥiρξ;i,m
ηifiyi,xi,Υi,m;θ2(old)d\Upsilonaiρη,:i,m
\xiai\xiaiTfiyi,xi,Υi,m;θ2(old)dΥiΔξξ;i,m
ηiηiTfiyi,xi,Υi,m;θ2(old)d\UpsilonaiΔ\etaa\etaa;i,m
(20) \xiaiηiTfiyi,xi,Υi,m;θ2(old)d\UpsilonaiΔξ\etaa;i,m,(20)

and the current estimate of the effective number of observations per component

nˆ(m)=i=1nρi,ml=1Mρi,l.

Then the updated parameter vector θ2(upd) is composed of

τ(upd)(m)=1nˆ(m)i=1nρ\etaa,:i,mΓ(old)(m)ρξ;i,ml=1Mρi,l
Γ(upd)(m)=i=1nΔξ\etaa;i,mT\tauu(old)(m)\rhoaξ;i,mTl=1Mρi,li=1nΔξξ;i,ml=1Mρi,l1
Φ(upd)(m)=\mua(old)(m)\mua(old)(m)T+1nˆ(m)i=1nΔξξ;i,m\mua(old)(m)ρξ;i,mTρξ;i,m\mua(old)(m)Tl=1Mρi,l
\mua(upd)(m)=1nˆ(m)i=1nρξ;i,ml=1Mρi,l
π(upd)(m)=nˆ(m)n,

for m=1,,M, and

Ψ(upd)=1ni=1n\Omegaal=1Mρi,l,

where

\Omegaa=l=1MΔ\etaa\etaa;i,lρ\etaa,:i,lτ(old)(l)Tτ(old)(l)ρ\etaa,:i,lTΔξ\etaa;i,lTΓ(old)(l)TΓ(old)(l)Δξ\etaa;i,l
+τ(old)(l)τ(old)(l)Tρi,m+τ(old)(l)ρξ;i,lTΓ(old)(l)T+Γ(old)(l)ρξ;i,lτ(old)(l)T+Γ(old)(l)Δξξ;i,lΓ(old)(l)T.

The integrals in (36) are intractable. For ρi,m the AGHQ approximation is applied as in the previous section. For the other integrals, the same weights and quadrature points are used. Integrals should still be asymptotically well approximated since the original integrand is multiplied by a polynomial of degree at most 2, as argued by Naylor and Smith (Citation1982). Once the updated estimates are obtained the previous estimates θ2(old) are replaced by the updated θ2(upd) and the procedure is repeated a number of times.