Full article: A Semiparametric Approach for Structural Equation Modeling with Ordinal Data

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

There is currently a lack of methods for non-linear structural equation modeling (NSEM) for non-parametric relationships between latent variables when data are ordinal. To this end, a semiparametric approach for flexible NSEMs without parametric forms is developed for ordinal data. An indirect application of a finite mixture of structural equation models (SEMM) is employed for modeling the conditional expected mean of endogenous latent variables. In this context, the latent classes are not to be interpreted as groups of observations belonging to those classes, rather they serve as means to model flexible non-linear functions as locally linear functions which together approximate a globally non-linear function. The proposed method is based on a hybrid of direct maximization and expectation-maximization algorithms. Two simulation studies are performed which show that parameter estimates are associated with low bias and a non-linear functional form is satisfactorily estimated using the proposed approach.

KEYWORDS:

Introduction

In recent decades, structural equation modeling (SEM) has become available to applied researchers, particularly in social sciences, thanks to software packages such as LISREL (Jöreskog & Sörbom, Citation1993, Citation1996) and Mplus (Muthén & Muthén, Citation2020) by modeling the covariance structure assuming normally distributed latent variables. Most SEM approaches have historically focused on linear relationships between latent variables. In this paper, we will refer the exogenous variables as the variables that are not determined by other variables in the model, whereas endogenous variables as the variables determined by other variables in the model. However, it is frequently necessary to incorporate more flexible functions in order to capture the true relationships. See Ajzen and Madden (Citation1986), Ajzen (Citation1991), and Agustin and Singh (Citation2005) for examples of such considerations. When the structural relationship is nonlinear, the latent endogenous variables are not normally distributed. Hence, the parameters cannot be determined only based on the covariance structure. Subsequently, new approaches were developed starting with Kenny and Judd (Citation1984), where products of indicators were used as indicators corresponding to product and interaction terms. This approach was further developed by Jaccard and Wan (Citation1995), Jöreskog and Yang (Citation1996), Kelava and Brandt (Citation2009), and Wall and Amemiya (Citation2001). By forming products of indicators for interaction and quadratic terms, estimation of nonlinear structural models was made possible in (at the time) existing software. The product indicator approaches have been criticized for their ad-hoc nature and arbitrariness in terms of which indicators to include (e.g.,Wall (Citation2009) and Harring et al. (Citation2012)). Since then, the methodological development for estimating nonlinear structural equation models (NSEM) has continued by maximum likelihood-based approaches, among which Klein and Moosbrugger (Citation2000) and Klein and Muthén (Citation2007) are the most established methods. Bayesian approaches of Arminger and Muthén (Citation1998) and Zhu and Lee (Citation1999), and two-stage least square approaches of Bollen (Citation1995, Citation1996) have also been considered. Most of the development for NSEM has been focused on models with continuous indicators, although there are exceptions such as Lee and Zhu (Citation2000), Lee and Song (Citation2003), Rizopoulos and Moustaki (Citation2008), and Muthén (Citation2001).

In all methods mentioned above, the functional form of the structural model is specified. However, it is sometimes desirable to relax this requirement in order to model a more general family of functions. Nonparametric approaches, such as Song et al. (Citation2014), and semiparametric approaches, e.g., Bauer (Citation2005), Pek et al. (Citation2011), Song and Lu (Citation2010), Finch (Citation2015), and Guo et al. (Citation2012) provide the opportunity to estimate models with unspecified functional forms. In the finite mixture of structural equation models (SEMMs), e.g., Bauer (Citation2005) and Pek et al. (Citation2011), latent exogenous variables are modeled as mixtures of multivariate normal distributions. The mixture-based semiparametric approaches have two directions of purposes. In the direct approach, the groups are interpreted as real subpopulations and, hence, group-specific structural parameters are interpreted as parameters associated with the corresponding subpopulation. On the other hand, in the indirect approach, the groups are not to be interpreted as subpopulations in any substantive sense. Instead, they provide means to model nonnormal latent variables as mixtures of normal distributions as in the nonlinear structural equation mixture model (NSEMM) by Kelava et al. (Citation2014). Hence, model estimation, robust to nonnormal exogenous latent variables, is possible in a parametric context. Another application of the indirect approach, as in the SEMM by Bauer (Citation2005) and Pek et al. (Citation2011), is to model a nonlinear relationship as weighted group-specific linear functions. In this approach, both nonlinearity and nonnormality in the latent variables are modeled by the normal mixture, resulting in the flexible model estimation, robust to nonnormal latent exogenous variables.

There is, at present, a lack of non- and semiparametric approaches for latent variable modeling in the presence of ordinal data, although there are a few exceptions, e.g., Song et al. (Citation2013) who extended the penalized Bayesian P-splines approach of Song and Lu (Citation2010) to incorporate mixed data types. It has been demonstrated that treating ordinal data as continuous can produce bias (Finney & DiStefano, Citation2006; Rhemtulla et al., Citation2012). In this article, a semiparametric approach, based on the work of Bauer (Citation2005) and Pek et al. (Citation2011) is developed to allow for ordinal indicators based on quasi-maximum likelihood estimation, combining the direct maximization of the approximated log-likelihood of Jin et al. (Citation2020) and the expectation-maximization (EM) algorithm of Dempster et al. (Citation1977).

The article is organized as follows. First, the model is outlined, followed by the estimation procedure. Two simulation studies are reported in the following section. The first study recognizes that accurate, group-specific parameters are obtained, whereas the second study shows that a structural model of quadratic form is satisfactorily estimated using the developed approach. The article concludes with discussion and conclusion. An appendix is provided with technical details.

The semiparametric model

In the SEMM, as proposed by Bauer (Citation2005) and Pek et al. (Citation2011), a weighted sum of linear components is used to model the conditional expected value $E (η | ξ)$ as

(1)

E (η | ξ) = \sum_{m = 1}^{M} P (m | ξ) (τ^{(m)} + Γ^{(m)} ξ),

(1)

where $ξ$ is a $K_{ξ}$ -dimensional vector of latent exogenous variables, $η$ is a $K_{\etaa}$ -dimensional vector of latent endogenous variables, $M$ is the number of components in the mixture, $τ^{(m)}$ is a $K_{\etaa}$ -vector of intercepts for component $m$ , $Γ^{(m)}$ is a $K_{\etaa} \times K_{ξ}$ -matrix representing linear effects of component $m$ , and the conditional probability of component $m$ is

(2)

P (m | ξ) = \frac{π^{(m)} N_{K_{ξ}} (ξ; \mua^{(m)}, Φ^{(m)})}{\sum_{l = 1}^{M} π^{(l)} N_{K_{ξ}} (ξ; \mua^{(l)}, Φ^{(l)})},

(2)

where $N_{K_{ξ}} ()$ is the multivariate normal density function, $π^{(m)}$ is the population proportion of component $m$ , $μ^{(m)}$ is the population mean vector of dimension $K_{ξ} \times 1$ corresponding to component $m$ and $Φ^{(m)}$ is the population covariance matrix of $ξ$ corresponding to component $m$ of dimension $K_{ξ} \times K_{ξ}$ .

Only continuous data are considered by Bauer (Citation2005) and Pek et al. (Citation2011). In this work, their approach is extended to work for ordinal data. Define $X_{j}$ to be the $j^{t h}$ ordinal indicator of $ξ$ and $Y_{j}$ to be the $j^{t h}$ ordinal indicator of $η$ .Footnote¹ The number of indicators corresponding to $ξ$ and $η$ is denoted by $ρ_{x}$ and $ρ_{y}$ , respectively, such that $x$ and $y$ are the $ρ_{x}$ and $ρ_{y}$ dimensional realizations of $X_{j}$ , for $j = 1, 2, 3 \dots, ρ_{x}$ and $Y_{j}$ for $j = 1, 2, \dots, ρ_{y}$ , respectively. The measurement model relating the continuous latent variables $ξ$ and $η$ to the ordinal observed variables $x$ and $y$ is then

(3)

g_{j}^{(x)} (P (X_{j} \leq u_{j} | ξ)) = α_{x, j}^{(u_{j})} - β_{x, j}^{T} ξ, u_{j} \in \{1, \dots, U_{j}\}, j = 1, 2, \dots, ρ_{x},

(3)

(4)

g_{j}^{(y)} (P (Y_{j} \leq v_{j} | η)) = α_{y, j}^{(v_{j})} - β_{y, j}^{T} η, v_{j} \in \{1, \dots, V_{j}\}, j = 1, 2, \dots, ρ_{y},

(4)

where $g_{j}^{(x)}$ and $g_{j}^{(y)}$ are the link functions, $β_{x, j}^{T}$ is the $ρ_{x}$ -vector of factor loadings of indicator $X_{j}$ , $β_{y, j}^{T}$ is the $ρ_{y}$ -vector of factor loadings of indicator $Y_{j}$ , and $U_{j}$ and $V_{j}$ are the number of categories of corresponding indicator. Here we let $α_{x, j}^{(U_{j})} = \infty$ and $α_{y, j}^{(V_{j})} = \infty$ .

Maximum likelihood estimation

The joint density function of $y$ , $x$ , $η$ , $ξ$ and $m$ is referred to as $f (y, x, η, ξ, m)$ . We assume that the observed variables $y$ and $x$ are independent, conditioned on the latent variables $η$ and $ξ$ and component $m$ . The distribution of $y$ is determined by $η$ and the distribution of $x$ is determined by $ξ$ from the measurement model (EquationEquations (3)(3) $g_{j}^{(x)} (P (X_{j} \leq u_{j} | ξ)) = α_{x, j}^{(u_{j})} - β_{x, j}^{T} ξ, u_{j} \in \{1, \dots, U_{j}\}, j = 1, 2, \dots, ρ_{x},$ (3) and (Equation4(4) $g_{j}^{(y)} (P (Y_{j} \leq v_{j} | η)) = α_{y, j}^{(v_{j})} - β_{y, j}^{T} η, v_{j} \in \{1, \dots, V_{j}\}, j = 1, 2, \dots, ρ_{y},$ (4) )). The measurement model does not depend on the group $m$ . The SEMM (1) indicates that the distribution of $η$ is determined by $ξ$ and component $m$ and that the distribution of $ξ$ is determined by the component $m$ . Thus, the joint density function is equivalent to

(5)

f (y, x, η, ξ, m) = f (y | η) f (x | ξ) f (η | ξ, m) f (ξ | m) f (m) .

(5)

where $f (y | η)$ and $f (x | ξ)$ are given by the measurement model (3) and (4),

(6)

ξ | m \sim N (ξ; \mua^{(m)}, Φ^{(m)})

(6)

(7)

f (m) = π^{(m)}, m = 1, 2, \dots, M,

(7)

with the restriction of total probability $\sum_{m = 1}^{M} π^{(m)} = 1$ . The conditional distribution of $η$ given $ξ$ and component $m$ is assumed to be

(8)

η | ξ, m \sim N (η; τ^{(m)} + Γ^{(m)} ξ, Ψ) .

(8)

In the current setting $Ψ$ is the same for all $M$ components for simplicity. This restriction could easily be lifted. Neither the latent variables $ξ_{i}$ and $η_{i}$ , nor the components $m_{i}$ , are observed, where subscript $i$ refers to observation $i$ . Hence, under the assumption of independent observations, the objective function for maximization is the observed log-likelihood

(9)

ℓ (θ) = \sum_{i = 1}^{n} log \sum_{m_{i} = 1}^{M} \int f (y_{i}, x_{i}, Υ_{i}, m_{i}) d Υ_{i},

(9)

where $θ$ is the vector of unrestricted parameters, $n$ is the number of observations and $Υ_{i}^{T} = (η_{i}^{T}, ξ_{i}^{T})$ , i.e., the vector of latent variables of the $i^{t h}$ observation. Similarly, $y_{i}$ and $x_{i}$ are vectors of observed indicators of the $i^{t h}$ observation, and $m_{i}$ represents the $m_{i}^{t h}$ component.

Estimation

The aim is to maximize $ℓ (θ)$ with respect to $θ$ . In situations with missing data (or latent variables), it is common to employ the EM algorithm (Dempster et al., Citation1977). However, the EM algorithm is known for its slow convergence. Jin et al. (Citation2020) proposed to estimate an extended NSEM by a quasi-Newton method where the exact gradient of the approximated log likelihood is calculated for finding the maximum of $ℓ (θ)$ without using EM, hence, referred to as direct maximization (DM). For the extended NSEM, we expect DM to be computationally more efficient than EM. Instead of purely relying on the EM algorithm, we propose to use a hybrid of EM and DM in Jin et al. (Citation2020) in order to estimate the SEMM. The developed version of the EM algorithm is expected to be reasonably computationally efficient since there are closed-form solutions to the parameter updates in one EM step. For the purpose of presentation, a sketch of the proposed algorithm is described here. A detailed description of the procedure can be found in the appendix.

Let $U^{(m a x)}$ and $V^{(m a x)}$ be the maximum number of categories of the indicators corresponding to $ξ$ and $η$ , respectively. Then let $α_{x}$ be a $ρ_{x} \times (U^{(m a x)} - 1)$ -matrix with element $α_{x, j}^{(u_{j})}$ on row $j$ and column $u_{j}$ , $\alphaa_{y}$ be a $ρ_{y} \times (V^{(m a x)} - 1)$ -matrix with element $\alphaa_{y, j}^{(v_{j})}$ on row $j$ and column $v_{j}$ . Let $β_{x}$ be a $ρ_{x} \times K_{ξ}$ -matrix with the row vector $\betaa_{x, j}^{T}$ on row $j$ and $\betaa_{y}$ be a $ρ_{y} \times K_{η}$ -matrix with $β_{y, j}^{T}$ on row $j$ . For convenience, the vector of unconstrained parameters is partitioned as $θ^{T} = (θ_{1}^{T}, θ_{2}^{T})$ , where $θ_{1}$ is a vector of unrestricted elements in $α_{y}$ , $β_{y}$ , $α_{x}$ and $β_{x}$ and $θ_{2}$ is a vector of unrestricted parameters in $Ψ$ , $\tauu^{(1)}$ , …, $\tauu^{(M)}$ , $Γ^{(1)}$ , $Γ^{(M)}$ , $Φ^{(1)}$ , …, $Φ^{(M)}$ , $\mua^{(1)}$ , …, $\mua^{(M)}$ , $π^{(1)}$ , …, $π^{(M)}$ . For a given $θ_{2}$ , $θ_{1}$ only consists of group invariant parameters in the measurement model. We propose to maximize $ℓ (θ_{1}, θ_{2})$ with respect to $θ_{1}$ for fixed $θ_{2}$ by DM. No closed-form updates of $θ_{1}$ are found. Hence, the quasi-Newton’s method is used to update $θ_{1}$ numerically. For a given $θ_{1}$ , $θ_{2}$ is updated using the EM algorithm. EM is used for some parameters since the logarithm of the sum over the components $m_{i}$ in (9) generates instability in the gradient of $ℓ (θ)$ . Further, in the proposed version of EM, the conditional maximization has closed-form solutions.

Both the closed-form updates in EM and the $ℓ (θ_{1}, θ_{2})$ to be maximized in DM involve intractable integrals and approximations are required. In DM, the log-likelihood $ℓ (θ_{1}, θ_{2})$ is approximated by the adaptive Gauss-Hermite quadrature (AGHQ, Liu & Pierce, Citation1994). The reader is directed to the appendix for the expression of an AGHQ approximation. For an unimodal function, the error rate of the AGHQ approximation is $O ({(ρ_{x} + ρ_{y})}^{- ⌊(q + 2) / 3⌋})$ (Jin & Andersson, Citation2020) where $⌊\cdot⌋$ takes the largest integer that is less than the enclosed value and $q$ is the number of quadrature points per latent variable. The AGHQ approximation was accurate for the extended NSEM investigated by Jin et al. (Citation2020) for $q \geq 5$ .

The exact gradient of the approximated log-likelihood is calculated in the quasi-Newton’s method, which is similar to the method of Jin et al. (Citation2020). In EM, the integrals in the closed-form solutions are also approximated by the AGHQ and are expected to be accurate for large samples and sufficiently large $q$ . See the appendix for details. A pseudocode of the proposed approach is presented in Algorithm 1 at iteration $k$ .

Algorithm 1 Proposed approach: Hybrid of EM and DM

while convergence criterion not met do

(a) DM Step: Conditional on $θ_{2}^{(k)}$ , update $θ_{1}$ by maximizing the approximated $ℓ (θ_{1}, θ_{2}^{(k)})$ using a quasi-Newton’s method

(b) EM step: Conditional on $θ_{1}^{(k + 1)}$ , update $θ_{2}$ with approximated closed-form solutions

end

Restrictions

In the SEM framework, a common assumption is that $E (Υ) = 0$ . In the proposed semiparametric setting, it is more convenient to put restrictions in the measurement model as explained below. It is, however, whenever appropriate, possible to perform a change of variables such that the new variables, $Υ^{*} = Υ - E (Υ)$ , where $E (Υ)$ is a function of model parameters. The transformed variables satisfy $E (Υ^{*}) = 0$ . The common assumption that $E (ξ) = 0$ , corresponds to $K_{ξ}$ restriction equations for proportions and means of exogenous latent variables. The corresponding maximization equations in the EM step then lack closed-form solutions. In the current work, we define $K_{ξ}$ restrictions on $α_{x}$ instead. Impose the restrictions $\alphaa_{x, s}^{(1)} = 0$ where $s$ is an index corresponding to a row of $α_{x}$ and on the same row in $β_{x}$ there is a restricted element setting the scale of a latent variable. There is one such restriction per latent exogenous variable giving rise to $K_{ξ}$ restrictions. Consider a change of variables for $\xia^{*} = \xia - \mua_{ξ}$ , where $μ_{ξ} = E (\xia) = \sum_{m = 1}^{M} π^{(m)} \mua^{(m)} \neq 0$ . In the translated variables we have $E (\xia^{*}) = 0$ . Then, the measurement model, e.g., corresponding to the indicator $x_{s}$ , is

g_{s}^{(x)} (P (X_{s} \leq 1 | ξ)) = α_{x, s}^{(1)} - \sum_{k = 1}^{K_{ξ}} β_{x, s}^{T} ξ .

where $\alphaa_{x, s}^{(1)} = 0$ and $ξ = \xia^{*} + μ_{\xia}$ . Similarly,

\mua_{\etaa} = E (η) = \sum_{m = 1}^{M} π^{(m)} (\tauu^{(m)} + Γ^{(m)} \mua^{(m)}) \neq 0,

in general. Correspondingly $K_{\etaa}$ number of restrictions for $η$ are also needed. For this reason, the restriction $τ^{(1)} = 0$ is imposed. Then, the change of variables $η^{*} = η - μ_{η}$ gives $E (η^{*}) = 0$ . Thus, it is possible and straightforward to alternate between the sets of variables $(η^{T}, \xia^{T})$ , for which the estimation procedure is designed, and $(η^{* T}, ξ^{* T})$ which has mean zero. For example, the population latent regression in Simulation Study 2 in the next section is $η^{*} = - 0.5 + 0.5 ξ^{*} + 0.5 ξ^{* 2} + ζ$ , which is based on the zero-mean parametrization, namely, $E (ξ^{*}) = E (\etaa^{*}) = 0$ . The linear predictor for the first indicator is $\alphaa_{x, 1}^{(1)} - β_{1}^{(x)} ξ^{*}$ , where $\alphaa_{x, 1}^{(1)} \neq 0$ , when generating the ordinal values. During estimation in our parametrization, we restrict $\alphaa_{x, 1}^{(1)} = 0$ and $\tauu^{(1)} = 0$ . The linear predictor for the first indicator becomes $- \betaa_{1}^{(x)} ξ$ and the latent regression in the first group is $Γ^{(1)} ξ$ . Consequently, neither $E (\etaa)$ nor $E (ξ)$ is necessarily zero. Instead, they are $μ_{ξ} = \sum_{m = 1}^{M} π^{(m)} \mua^{(m)}$ and $\mua_{\etaa} = \sum_{m = 1}^{M} π^{(m)} (\tauu^{(m)} + Γ^{(m)} \mua^{(m)})$ , respectively. We can easily transform the estimates from our parametrization to the zero-mean parametrization using $ξ^{*} = ξ - μ_{ξ}$ and $\etaa^{*} = \etaa - \mua_{\etaa}$ . In particular, the $\betaa$ and $Γ^{(m)}$ , for all $m$ , remain the same. The measurement models in two parametrizations can be transformed by

\alphaa_{x, j}^{(u_{j})} - \betaa_{x, j} ξ = (\alphaa_{x, j}^{(u_{j})} - \betaa_{x, j} \mua_{ξ}) - \betaa_{x, j} ξ^{*},

\alphaa_{y, j}^{(v_{j})} - \betaa_{y, j} \etaa = (\alphaa_{y, j}^{(v_{j})} - \betaa_{y, j} \mua_{\etaa}) - \betaa_{y, j} \etaa^{*} .

For the latent regression, our parametrization yields $E (\etaa | ξ, m) = \tauu^{(m)} + Γ^{(m)} ξ$ , which indicates that

E (* | ξ *, m) = (- + (m)) + Γ (m) ξ .

Hence, the conditional mean in the zero-mean parametrization can be obtained by

E (\etaa^{*} | ξ^{*}) = \sum_{m = 1}^{M} P (m | ξ^{*}) (\tauu^{(m)} + Γ^{(m)} \mua_{ξ}) - \mua_{\etaa},

where $P (m | ξ^{*}) = P (m | ξ)$ .

Simulations

In order to investigate the performance of the proposed method, two simulation studies are performed. In the first study, data is generated using prespecified, true model parameters, and the accuracy of the parameter estimation is investigated. In the second study, the data is generated using a quadratic function. Hence, no true parameters are known (or even existing). The degree to which the procedure captures the true nonlinear function is investigated, without having access to true parameters. The code is written in R (R Core Team, Citation2018) and will be provided upon request and will be released as an R package. In the simulations, the estimation stops when the maximum absolute difference in the parameters between two consecutive iterations is less than $5 \cdot 10^{- 4}$ . In each simulation study, 1000 samples are generated and estimated using 5 quadrature points. For evaluation purposes in Simulation Study 1, the relative bias $R B_{p} = 100 R^{- 1} \sum_{i = 1}^{R} \thetaa_{p}^{- 1} ({\hat{\thetaa}}_{p, i} - \thetaa_{p})$ , root mean squared error $R M S E_{p} = \sqrt{R^{- 1} \sum_{i = 1}^{R} {[{\hat{\thetaa}}_{p, i} - \thetaa_{p}]}^{2}}$ , and standard deviation $S D_{p} = \sqrt{R^{- 1} \sum_{i = 1}^{R} {[{\hat{\thetaa}}_{p, i} - {\overset{ˉ}{\thetaa}}_{p}]}^{2}}$ are calculated for each structural parameter $θ_{p}$ , where $R$ is the number of replications and ${\hat{\thetaa}}_{p, i}$ is the corresponding estimate at the $i^{t h}$ replication, and ${\overset{ˉ}{\thetaa}}_{p}$ is the average estimate of the parameter $\thetaa_{p}$ .

Simulation study 1

Simulation design

In the first simulation study, data is generated using two bivariate normally distributed components with population proportions $π^{(1)} = π^{(2)} = 0.5$ of the two latent exogenous variables ( $ξ_{1}$ and $ξ_{2}$ ). Within each component, the variances are $0.5$ and covariances $- 0.2$ . In this setting, the component means ( $\mua^{(1)}$ and $\mua^{(2)}$ ) are not free parameters, rather functions of other parmeters, as explained in the previous section. One latent endogenous variable $\etaa$ is generated conditioned on $ξ_{1}$ and $ξ_{2}$ from a normal distribution with conditional mean given by (1) and (2), with an error term with variance $Ψ_{11} = 1.0$ . The generated latent variables are used to generate the ordinal indicators $y_{i}$ and $x_{i}$ from a measurement model given by (3) and (4) with the probit link function. For setting the location of $ξ$ , $\alphaa_{x, 1}^{(1)} = \alphaa_{x, 4}^{(1)} = 0$ . Similarly, restrict the local intercept $\tauu_{1}^{(1)} = 0$ , for setting the location on $\etaa$ . Furthermore, the zero elements in $β_{x}$ are restricted to zero, which gives a simple structure.Footnote² In order to set the scale of the latent variables $\betaa_{x, 11} = \betaa_{x, 42} = \betaa_{y, 11} = 2$ . The model parameters of the measurement model are then given by

\begin{matrix} α_{x}^{T} = (\begin{matrix} 0 & - 0.04 & - 0.08 & 0 & - 0.04 & - 0.07 \\ 3.75 & 3.37 & 2.99 & 3.75 & 3.37 & 2.99 \\ 4.96 & 4.47 & 3.99 & 4.96 & 4.47 & 3.99 \\ 6.38 & 5.79 & 5.21 & 6.38 & 5.80 & 5.22 \end{matrix}) \\ β_{x}^{T} = (\begin{matrix} 2 & 1.80 & 1.60 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2 & 1.80 & 1.60 \end{matrix}) \\ α_{y}^{T} = (\begin{matrix} - 2.83 & - 2.58 & - 2.32 \\ - 0.87 & - 0.79 & - 0.71 \\ 1.78 & 1.61 & 1.45 \\ 4.25 & 3.86 & 3.48 \end{matrix}) \\ β_{y}^{T} = (\begin{matrix} 2 & 1.80 & 1.60 \end{matrix}) . \end{matrix}

The remaining, free model parameters are the local, linear parameters $Γ^{(1)} = (- 1.0, - 1.0)$ , $Γ^{(2)} = (1.0, 1.0)$ and the local intercept $τ_{1}^{(2)} = - 4.0$ . The parameters are such that the $R^{2}$ of the structural model is approximately 0.69. Reliabilities are between 0.79 and 0.86 for $x$ and between 0.86 and 0.91 for $y$ . Furthermore, the marginal probabilities of the respective categories are 0.15, 0.45, 0.15, 0.15, 0.10 for all items in $x$ , and 0.20,0.20, 0.30, 0.20, 0.10 for all items in $y$ . The sample sizes investigated are $n = {500, 1000}$ .

Simulation results

Two (out of 1000) replications failed to converge for $n = 500$ and all samples converged for $n = 1000$ . In line with the literature (Flora & Curran, Citation2004), relative biases larger than 5 are considered to be moderate to substantial. In ( $n = 1000$ ) and ( $n = 500$ ) the population parameter values ( $\thetaa_{p}$ ), average estimates ( ${\overset{ˉ}{\thetaa}}_{p}$ ), RB, $10 \cdot$ SD and $10 \cdot$ RMSE are reported for the parameters associated with the structural, semiparametric model. The proposed semiparametric method provides low biases for all parameters. The RMSE is almost exclusively constituted by the variance of the estimates.

Table 1. Parameter and corresponding average estimates using AGHQ approximation with 5 quadrature points per latent dimension and 1000 replications. RB, SD, and RMSE stand for relative bias, standard deviation, and root mean squared error, respectively. The sample size is $N = 1000$

Display Table

Table 2. Parameter and corresponding average estimates using AGHQ approximation with 5 quadrature points per latent dimension and 1000 replications. RB, SD, and RMSE stand for relative bias, standard deviation, and root mean squared error, respectively. The sample size is $n = 500$

Display Table

Simulation study 2

Simulation design

In the second simulation study, data ( $n = 1000$ ) were generated using the same quadratic function as Bauer (Citation2005), namely

(10)

\etaa^{*} = - 0.5 + 0.5 \xia^{*} + 0.5 \xia^{* 2} + ζ,

(10)

where $\xia^{*}$ is generated from a standard normal distribution, the error term $ζ$ is generated from a normal distribution with mean 0 and variance 0.25. The stars are used because $E (\etaa^{*}) = E (ξ^{*}) = 0$ . Hence, after estimation, the estimated function is shifted as described before. The parameters of the measurement model are given by

\begin{matrix} α_{x}^{T} = (\begin{matrix} 0 & - 0.20 & - 0.10 \\ 2.97 & 2.63 & 2.39 \\ 3.85 & 3.59 & 3.22 \\ 5.24 & 4.84 & 4.19 \end{matrix}) \\ β_{x}^{T} = (\begin{matrix} 2 & 1.80 & 1.60 \end{matrix}) \\ α_{y}^{T} = (\begin{matrix} - 1.70 & - 1.58 & - 1.42 \\ - 0.67 & - 0.58 & - 0.46 \\ 0.89 & 0.75 & 0.86 \\ 2.88 & 2.82 & 2.45 \end{matrix}) \\ β_{y}^{T} = (\begin{matrix} 2 & 1.80 & 1.60 \end{matrix}), \end{matrix}

where $\alphaa_{x, 11} = 0$ and $\betaa_{y, 11} = \betaa_{x, 11} = 2$ are imposed restrictions. This gives an $R^{2}$ of the structural model of approximately 0.71 and reliabilities between 0.73 and 0.81 for all indicators. The marginal probabilities in respective categories are the same as in those of Simulation Study 1. The sample size is 1000.

Simulation results

The conditional expected mean $E (\etaa^{*} | ξ^{*})$ is shown as a function of $ξ^{*}$ in when 2 and 3 components are used (left and middle panels), respectively. The bold line represents the population function in Equationequation (10)(10) $\etaa^{*} = - 0.5 + 0.5 \xia^{*} + 0.5 \xia^{* 2} + ζ,$ (10) , whereas the thin line represents the average estimated conditional mean function. Ninety-five percent of the estimated functions are within the shaded areas. With two components the functional form is captured to some degree, although the performance is substantially better using three components. Although the average result is acceptable, in some instances, the form of the conditional expected mean is jumping peculiarly when one of the proportions is estimated to be close to zero. In practice, it is difficult to know how many components to use. One possibility is to use the Akaike information criterion (AIC) for selecting the number of components. The AIC is defined as

A I C = 2 k - 2 ℓ (\hat{θ}),

Figure 1. The conditional mean of the true data generating process (thick line) and the average estimated conditional mean (thin line) with empirical 95% intervals (filled blue areas) for 2 components (left), 3 components (middle) and when AIC is used for selecting the number of components (right)

where $k$ is the number of estimated parameters and $ℓ (\hat{\thetaa})$ is the approximated log likelihood at the mode. In the simulations, 3 components are recommended in approximately 84% of the replications. In the right panel of , each replication has been estimated using 2 and 3 components and the model with the lowest AIC has been selected. The average estimated conditional mean function (the thin line) captures the population function well.

Discussion and conclusion

In NSEM it is common to consider terms corresponding to interaction and quadratic effects for modeling nonlinearities in the structural model. It is sometimes appropriate to consider more flexible functions with less, or no a priori assumptions about the functional form. Such models are less common in the literature, particularly models which consider ordinal data. In this article, a semiparametric structural model (Bauer, Citation2005; Pek et al., Citation2011) is proposed in the presence of ordinal data. The structural model consists of a weighted sum of locally linear functions, which together approximate a globally nonlinear function. A hybrid of a direct maximization approach (Jin et al., Citation2020) and an expectation-maximization algorithm (Dempster et al., Citation1977) is suggested and implemented for maximizing the marginal log-likelihood of the observed ordinal data. Two simulation studies are performed. The first shows that the proposed method provides parameter estimates with low relative bias, whereas the second study illustrates how the method accurately captures a nonlinear structural model in terms of the conditional mean of the latent endogenous variable. The accurate parameter estimates and performance in estimating the conditional mean suggest that the AGHQ approximations are of high quality for $q \geq 5$ in models similar to those investigated in this study. Two and three components are used in the second study. Using two components yields a reasonable and stable functional form, whereas three components yield a more accurate functional form on average, with the drawback of occasionally unstable estimates. The number of components might possibly be selected using AIC. Advantages of the proposed method include the capacity of modeling flexible structural models in the presence of ordinal data when the latent exogenous variables are not necessarily normal. In this study, little attention is devoted to the latter point. A suggestion for future studies is to investigate the performance of the proposed method under moderate to severe nonnormality of latent exogenous variables. This is left as a future project.

Correction Statement

This article has been republished with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

The research reported in this article has been supported by Vetenskapsrådet (Swedish Research Council) under the program “New Statistical Methods for Latent Variable Models, 2017-01175”.

Notes

¹ The subscripts on

X

and

Y

should in principle be different, e.g.,

j_{x}

and

j_{y}

but for notational convenience, we use the same (

j

) for

X

and

Y

² A simple structure is not necessary, but in this first study, a simple structure is employed for simplicity.

References

Agustin, C., & Singh, J. (2005). Curvilinear effects of consumer loyalty determinants in relational exchanges. Journal of Marketing Research, 42, 96–108. https://doi.org/10.1509/jmkr.42.1.96.56961
Web of Science ®Google Scholar
Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50, 179–211. https://doi.org/10.1016/0749-5978(91)90020-T
Web of Science ®Google Scholar
Ajzen, I., & Madden, T. J. (1986). Prediction of goal-directed behavior: Attitudes, intentions, and perceived behavioral control. Journal of Experimental Social Psychology, 22, 453–474. https://doi.org/10.1016/0022-1031(86)90045-4
Web of Science ®Google Scholar
Arminger, G., & Muthén, B. O. (1998). A Bayesian approach to nonlinear latent variable models using the Gibbs sampler and the Metropolis-Hastings algorithm. Psychometrika, 63, 271–300. https://doi.org/10.1007/BF02294856
Web of Science ®Google Scholar
Bauer, D. J. (2005). A semiparametric approach to modeling nonlinear relations among latent variables. Structural Equation Modeling: A Multidisciplinary Journal, 12, 513–535. https://doi.org/10.1207/s15328007sem1204_1
Web of Science ®Google Scholar
Bollen, K. A. (1995). Structural equation models that are nonlinear in latent variables: A least-squares estimator. In Sociological methodology (pp. 223–251). Washington, DC: American Sociological Association.
Google Scholar
Bollen, K. A. (1996). An alternative two stage least squares (2SLS) estimator for latent variable equations. Psychometrika, 61, 109–121. https://doi.org/10.1007/BF02296961
Web of Science ®Google Scholar
R Core Team (2018). R: A language and environment for statistical computing. Vienna, AU: R Foundation for Statistical Computing.
Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38. https://www.jstor.org/stable/2984875
Web of Science ®Google Scholar
Finch, W. H. (2015). Modeling nonlinear structural equation models: A comparison of the two-stage generalized additive models and the finite mixture structural equation model. Structural Equation Modeling: A Multidisciplinary Journal, 22, 60–75. https://doi.org/10.1080/10705511.2014.935749
Web of Science ®Google Scholar
Finney, S. J., & DiStefano, C. (2006). Structural equation modeling: A second course. Information Age Publishing.
Google Scholar
Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9, 446–491. https://doi.org/10.1037/1082-989X.9.4.466
PubMed Web of Science ®Google Scholar
Guo, R., Zhu, H., Chow, S.-M., & Ibrahim, J. G. (2012). Bayesian lasso for semiparametric structural equation models. Biometrics, 68, 567–577. https://doi.org/10.1111/j.1541-0420.2012.01751.x
PubMed Web of Science ®Google Scholar
Harring, J. R., Weiss, B. A., & Hsu, J.-C. (2012). A comparison of methods for estimating quadratic effects in nonlinear structural equation models. Psychological Methods, 17, 193. https://doi.org/10.1037/a0027539
PubMed Web of Science ®Google Scholar
Jaccard, J., & Wan, C. K. (1995). Measurement error in the analysis of interaction effects between continuous predictors using multiple regression: Multiple indicator and structural equation approaches. Psychological Bulletin, 117(2), 348–357. https://doi.org/10.1037/0033-2909.117.2.348
Web of Science ®Google Scholar
Jin, S., & Andersson, B. (2020). A note on the accuracy of adaptive Gauss–Hermite quadrature. Biometrika, 107, 737–744. https://doi.org/10.1093/biomet/asz080
Web of Science ®Google Scholar
Jin, S., Vegelius, J., & Yang-Wallentin, F. (2020). A marginal maximum likelihood approach for extended quadratic structural equation modeling with ordinal data. Advance Online Publication in Structural Equation Modeling: A Multidisciplinary Journal, 1–10. https://doi.org/10.1080/10705511.2020.1712552
Google Scholar
Jöreskog, K. G., & Sörbom, D. (1993). New features in lisrel 8. Scientific Software.
Google Scholar
Jöreskog, K. G., & Sörbom, D. (1996). LISREL 8: User’s reference guide. Scientific Software International.
Google Scholar
Jöreskog, K. G., & Yang, F. (1996). Non-linear structural equation models: The kenny-judd model with interaction effects. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (pp. 57–88). Lawrence Erlbaum Associates.
Google Scholar
Kelava, A., & Brandt, H. (2009). Estimation of nonlinear latent structural equation models using the extended unconstrained approach. Review of Psychology, 16, 123–132. https://hrcak.srce.hr/70644
Google Scholar
Kelava, A., Nagengast, B., & Brandt, H. (2014). A nonlinear structural equation mixture modeling approach for nonnormally distributed latent predictor variables. Structural Equation Modeling: A Multidisciplinary Journal, 21, 468–481. https://doi.org/10.1080/10705511.2014.915379
Web of Science ®Google Scholar
Kenny, D. A., & Judd, C. M. (1984). Estimating the nonlinear and interactive effects of latent variables. Psychological Bulletin, 96, 201–210. https://doi.org/10.1037/0033-2909.96.1.201
Web of Science ®Google Scholar
Klein, A., & Moosbrugger, H. (2000). Maximum likelihood estimation of latent interaction effects with the LMS method. Psychometrika, 65, 457–474. https://doi.org/10.1007/BF02296338
Web of Science ®Google Scholar
Klein, A., & Muthén, B. (2007). Quasi-maximum likelihood estimation of structural equation models with multiple interaction and quadratic effects. Multivariate Behavioral Research, 42, 647–673. https://doi.org/10.1080/00273170701710205
Web of Science ®Google Scholar
Lee, S.-Y., & Song, X.-Y. (2003). Model comparison of nonlinear structural equation models with fixed covariates. Psychometrika, 68, 27–47. https://doi.org/10.1007/BF02296651
Web of Science ®Google Scholar
Lee, S.-Y., & Zhu, H.-T. (2000). Statistical analysis of nonlinear structural equation models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology, 53, 209–232. https://doi.org/10.1348/000711000159303
PubMed Web of Science ®Google Scholar
Liu, Q., & Pierce, D. A. (1994). A note on Gauss-Hermite quadrature. Biometrika, 81, 624–629. https://doi.org/10.2307/2337136
Web of Science ®Google Scholar
Muthén, B. (2001). Second-generation structural equation modeling with a combination of categorical and continuous latent variables: New opportunities for latent class/latent growth modeling. In L. M. Collins, & A. Sayer (Eds.), New methods for the analysis of change (pp. 291–322). Washington, DC: APA.
Google Scholar
Muthén, L., & Muthén, B. (2020). Mplus. The comprehensive modelling program for applied researchers: user’s guide, 5. Los Angeles, CA: Muthén & Muthén.
Google Scholar
Naylor, J. C., & Smith, A. F. (1982). Applications of a method for the efficient computation of posterior distributions. Journal of the Royal Statistical Society. Series C, Applied Statistics, 31, 214–225. https://doi.org/10.2307/2347995
Web of Science ®Google Scholar
Pek, J., Losardo, D., & Bauer, D. J. (2011). Confidence intervals for a semiparametric approach to modeling nonlinear relations among latent variables. Structural Equation Modeling: A Multidisciplinary Journal, 18, 537–553. https://doi.org/10.1080/10705511.2011.607072
Web of Science ®Google Scholar
Rhemtulla, M., Brosseau-Liard, P. E., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical sem estimation methods under suboptimal condition. Psychological Methods, 17, 354–373. https://doi.org/10.1037/a0029315
PubMed Web of Science ®Google Scholar
Rizopoulos, D., & Moustaki, I. (2008). Generalized latent variable models with non-linear effects. British Journal of Mathematical and Statistical Psychology, 61, 415–438. https://doi.org/10.1348/000711007X213963
PubMed Web of Science ®Google Scholar
Song, X.-Y., Lu, Z., & Feng, X. (2014). Latent variable models with nonparametric interaction effects of latent variables. Statistics in Medicine, 33, 1723–1737. https://doi.org/10.1002/sim.6065
PubMed Web of Science ®Google Scholar
Song, X.-Y., & Lu, Z.-H. (2010). Semiparametric latent variable models with bayesian p-splines. Journal of Computational and Graphical Statistics, 19, 590–608. https://doi.org/10.1198/jcgs.2010.09094
Web of Science ®Google Scholar
Song, X.-Y., Lu, Z.-H., Cai, J.-H., & Ip, E. H.-S. (2013). A bayesian modeling approach for generalized semiparametric structural equation models. Psychometrika, 78, 624–647. https://doi.org/10.1007/s11336-013-9323-7
PubMed Web of Science ®Google Scholar
Wall, M. M. (2009). Maximum likelihood and bayesian estimation for nonlinear structural equation models. In R. E. Millsap & A. Maydeu-Olivares (Eds.), The SAGE handbook of quantitative methods in psychology (pp. 540–567). London: Sage Publications Ltd.
Google Scholar
Wall, M. M., & Amemiya, Y. (2001). Generalized appended product indicator procedure for nonlinear structural equation analysis. Journal of Educational and Behavioral Statistics, 26, 1–29. https://doi.org/10.3102/10769986026001001
Google Scholar
Zhu, H.-T., & Lee, S.-Y. (1999). Statistical analysis of nonlinear factor analysis models. British Journal of Mathematical and Statistical Psychology, 52, 225–242. https://doi.org/10.1348/000711099159080
Web of Science ®Google Scholar

Appendix A

Integral approximation

The integrals in Equationequation (9)(9) $ℓ (θ) = \sum_{i = 1}^{n} log \sum_{m_{i} = 1}^{M} \int f (y_{i}, x_{i}, Υ_{i}, m_{i}) d Υ_{i},$ (9) are intractable. In this work, such integrals are approximated using the AGHQ approximation. The Gauss-Hermite quadrature approximation of the $K$ -dimensional integral $\int exp \{g (t)\} d t$ is a weighed sum of function values as

(11)

\sum_{k_{1} = 1}^{q} \dots \sum_{k_{K} = 1}^{q} w_{k_{1} k_{2} \dots k_{K}}^{*} exp \{g (z_{k_{1}}, \dots, z_{k_{K}})\},

(11)

where

(12)

w_{k_{1} k_{2} \dots k_{K}}^{*} = \prod_{j = 1}^{K} w_{k_{j}} exp \{z_{k_{j}}^{2}\},

(12)

where $z_{k_{j}}$ and $w_{k_{j}}$ are the Gauss-Hermite quadrature points and weights, respectively. $q$ is the number of quadrature points per dimension. In adaptive Gauss-Hermite quadrature approximation, proposed by Liu and Pierce (Citation1994), the quadrature points are translated and dilated to improve the approximation of unimodel functions $g (t)$ .

Appendix B

Maximum Likelihood Estimation

Define

(13)

h_{i, m_{i}} (Υ_{i}; θ) = log f (y_{i}, x_{i}, Υ_{i}, m_{i}; θ) .

(13)

Then, (5) yields

h_{i, m_{i}} (Υ_{i}; θ) = log f (y_{i} | η_{i}) + log f (x_{i} | \xia_{i}) + log f (η_{i} | ξ_{i}, m_{i}) + f (\xia_{i} | m_{i}) + log f (m_{i}) .

EquationEquations (3)(3) $g_{j}^{(x)} (P (X_{j} \leq u_{j} | ξ)) = α_{x, j}^{(u_{j})} - β_{x, j}^{T} ξ, u_{j} \in \{1, \dots, U_{j}\}, j = 1, 2, \dots, ρ_{x},$ (3) , (Equation4(4) $g_{j}^{(y)} (P (Y_{j} \leq v_{j} | η)) = α_{y, j}^{(v_{j})} - β_{y, j}^{T} η, v_{j} \in \{1, \dots, V_{j}\}, j = 1, 2, \dots, ρ_{y},$ (4) ), (Equation6(6) $ξ | m \sim N (ξ; \mua^{(m)}, Φ^{(m)})$ (6) ), (Equation7(7) $f (m) = π^{(m)}, m = 1, 2, \dots, M,$ (7) ), (Equation8(8) $η | ξ, m \sim N (η; τ^{(m)} + Γ^{(m)} ξ, Ψ) .$ (8) ) and (Equation13(13) $h_{i, m_{i}} (Υ_{i}; θ) = log f (y_{i}, x_{i}, Υ_{i}, m_{i}; θ) .$ (13) ) give

h_{i, m_{i}} (Υ_{i}; θ) = \sum_{j = 1}^{ρ_{x}} log P (X_{i j} = u_{i j} | ξ_{i}) + \sum_{j = 1}^{ρ_{x}} P (Y_{i j} = v_{i j} | η_{i})

- \frac{K_{η}}{2} log (2 π) - \frac{1}{2} log | Ψ | - \frac{1}{2} {(η_{i} - τ^{(m_{i})} - \Gammaa^{(m_{i})} ξ_{i})}^{T} Ψ^{- 1} (η_{i} - τ^{(m_{i})} - \Gammaa^{(m_{i})} ξ_{i})

- \frac{K_{ξ}}{2} log (2 π) - \frac{1}{2} log | Φ^{(m_{i})} | - \frac{1}{2} {(μ^{(m_{i})} - ξ_{i})}^{T} {[Φ^{(m_{i})}]}^{- 1} (\mua^{(m_{i})} - ξ_{i}) + log π^{(m_{i})} .

Measurement Model Estimation

Since the sum in (9) does not depend on the parameters in $θ_{1}$ , direct maximization, as proposed by Jin et al. (Citation2020) is implemented for $θ_{1}$ given the current estimates of $θ_{2}$ . The observed likelihood is

(14)

L (θ) = \prod_{i = 1}^{n} L_{i} (θ) = \prod_{i = 1}^{n} \sum_{m_{i} = 1}^{M} \int exp \{h_{i, m_{i}} (Υ_{i}; θ)\} d Υ_{i} .

(14)

where $L_{i}$ is the observed likelihood of the $i^{t h}$ observation. Let the mode ${\hat{Υ}}_{i, m}$ be the solution to

\frac{\partial h_{i, m_{i}}}{\partial Υ_{i}} = 0,

and the Hessian be

H_{i, m_{i}} = \frac{\partial^{2} h_{i, m_{i}}}{\partial Υ_{i} \partial Υ_{i}^{T}},

evaluated at the mode. Further, let $L_{i, m_{i}} L_{i, m_{i}}^{T} = {(- H_{i, m_{i}} ({\hat{Υ}}_{i, m_{i}}; θ))}^{- 1}$ . Applying the AGHQ approximation using Equationequations (11)(11) $\sum_{k_{1} = 1}^{q} \dots \sum_{k_{K} = 1}^{q} w_{k_{1} k_{2} \dots k_{K}}^{*} exp \{g (z_{k_{1}}, \dots, z_{k_{K}})\},$ (11) and (Equation12(12) $w_{k_{1} k_{2} \dots k_{K}}^{*} = \prod_{j = 1}^{K} w_{k_{j}} exp \{z_{k_{j}}^{2}\},$ (12) ) to the $i^{t h}$ factor of Equationequation (14)(14) $L (θ) = \prod_{i = 1}^{n} L_{i} (θ) = \prod_{i = 1}^{n} \sum_{m_{i} = 1}^{M} \int exp \{h_{i, m_{i}} (Υ_{i}; θ)\} d Υ_{i} .$ (14) , gives

L_{i} (θ) \approx \sum_{m_{i} = 1}^{M} exp [\frac{K}{2} log 2 - \frac{1}{2} log |- H_{i, m_{i}} ({\hat{Υ}}_{i, m_{i}}; θ)|] \times

(15)

\sum_{k_{1} = 1}^{q} \dots \sum_{k_{K} = 1}^{q} \{w_{k_{1}, k_{2}, \dots, k_{K}}^{*} exp [h_{i} ({\tilde{Υ}}_{i, m_{i}} (z_{k_{1}}, z_{k_{2}}, \dots, z_{k_{K}}); θ)]\} .

(15)

where ${\tilde{Υ}}_{i, m_{i}} (k_{1}, k_{2}, \dots, k_{K}) = \sqrt{2} L_{i, m_{i}} z_{k_{1}, k_{2}, \dots, k_{K}} + {\hat{Υ}}_{i, m_{i}}$ are the translated and dilated latent variables about the mode. And with $ℓ_{i} (θ) = log L_{i} (θ)$ the corresponding observed log-likelihood is

ℓ (θ) = \sum_{i = 1}^{n} ℓ_{i} (θ) \approx \sum_{i = 1}^{n} log \sum_{m_{i} = 1}^{M} e^{ℓ_{i, m}^{(a g h q)} (θ, {\hat{Υ}}_{i, m_{i}})},

where $ℓ_{i, m}^{(a g h q)} (θ, {\hat{Υ}}_{i, m_{i}})$ is the log of the $m_{i}^{t h}$ term in Equationequation (15)(15) $\sum_{k_{1} = 1}^{q} \dots \sum_{k_{K} = 1}^{q} \{w_{k_{1}, k_{2}, \dots, k_{K}}^{*} exp [h_{i} ({\tilde{Υ}}_{i, m_{i}} (z_{k_{1}}, z_{k_{2}}, \dots, z_{k_{K}}); θ)]\} .$ (15) . $θ_{1}$ is updated by solving

(16)

\frac{\partial ℓ}{\partial θ_{1}} = 0

(16)

with respect to $θ_{1}$ , using the approximation in Equationequation (15)(15) $\sum_{k_{1} = 1}^{q} \dots \sum_{k_{K} = 1}^{q} \{w_{k_{1}, k_{2}, \dots, k_{K}}^{*} exp [h_{i} ({\tilde{Υ}}_{i, m_{i}} (z_{k_{1}}, z_{k_{2}}, \dots, z_{k_{K}}); θ)]\} .$ (15) . Then Equationequation (16)(16) $\frac{\partial ℓ}{\partial θ_{1}} = 0$ (16) is

\frac{\partial ℓ}{\partial θ_{1}} \approx \sum_{i = 1}^{n} \frac{\sum_{m_{i} = 1}^{M} \frac{\partial ℓ_{i, m_{i}}^{(a g h q)} (θ, {\hat{Υ}}_{i, m_{i}})}{\partial θ_{1}} exp \{ℓ_{i, m_{i}}^{(a g h q)} ({\hat{Υ}}_{i, m_{i}}; θ)\}}{\sum_{m_{i} = 1}^{M} exp \{ℓ_{i, m_{i}}^{(a g h q)} ({\hat{Υ}}_{i, m_{i}}; θ)\}} .

Acknowledging that the mode ${\hat{Υ}}_{i, m_{i}}$ depends on $θ$ , the gradient of $ℓ_{i, m} (θ, {\hat{Υ}}_{i, m} (θ))$ is

(17)

\frac{\partial ℓ_{i, m_{i}} (θ, {\hat{Υ}}_{i, m_{i}} (θ))}{\partial θ_{1}} = \frac{\partial ℓ_{i, m_{i}} (θ)}{\partial θ_{1}} |_{Υ = {\hat{Υ}}_{i, m_{i}}} + {(\frac{\partial {\hat{Υ}}_{i, m_{i}} (θ)}{\partial θ^{T}})}^{T} \frac{\partial ℓ_{i, m_{i}} (Υ)}{\partial Υ} |_{Υ = {\hat{Υ}}_{i, m_{i}}},

(17)

where $\partial ℓ_{i, m_{i}} (θ) / \partial θ_{1}$ is the derivative of $ℓ_{i, m_{i}}$ with respect to $θ_{1}$ , treating ${\hat{Υ}}_{i, m_{i}}$ as constant, whereas $\partial ℓ_{i, m_{i}} (Υ) / \partial Υ$ is the derivative of $ℓ_{i, m_{i}}$ with respect to the latent variables, treating $θ$ as fixed. The first factor in the second term of (17) is, by the implicit function theorem

\frac{\partial {\hat{Υ}}_{i, m} (θ)}{\partial θ_{1}^{T}} = - {(\frac{\partial^{2} h_{i} (Υ; θ)}{\partial Υ \partial Υ^{T}} |_{Υ_{i, m} = {\hat{Υ}}_{i, m}})}^{- 1} \frac{\partial^{2} h_{i, m} (Υ; θ)}{\partial Υ \partial θ_{1}^{T}} |_{Υ_{i, m} = {\hat{Υ}}_{i, m}} .

In order to find the solution to Equationequation (16)(16) $\frac{\partial ℓ}{\partial θ_{1}} = 0$ (16) the BFGS approximation to the Newton–Raphson method is implemented. The inverse of $H_{i, m_{i}}$ is approximated and updated by low-rank approximation at each iteration. The updated parameter vector $θ_{1}$ is obtained by updating $θ_{1}$ with the BFGS algorithm a number of times.

Structural Model Estimation

If the quasi-Newton-Raphson method proposed in the previous section would be used, the sum in Equationequation (14)(14) $L (θ) = \prod_{i = 1}^{n} L_{i} (θ) = \prod_{i = 1}^{n} \sum_{m_{i} = 1}^{M} \int exp \{h_{i, m_{i}} (Υ_{i}; θ)\} d Υ_{i} .$ (14) introduces instability for parameters, which depend on $m_{i}$ . All elements in $θ_{2}$ depend on $m_{i}$ except for $Ψ$ . $Ψ$ is the same for all $m_{i}$ in this setting only for simplicity. The reason for keeping $Ψ$ in $θ_{2}$ is for convenience at later stages in more general settings. The EM algorithm of Dempster et al. (Citation1977) is implemented for updating $θ_{2}$ given the current estimates of $θ_{1}$ , as follows:

Treating $Υ_{i}$ and $m_{i}$ as random variables, let the conditional distribution of $Υ_{i}$ and $m_{i}$ be

(18)

f_{i} (Υ_{i}, m_{i} | y_{i}, x_{i}; θ_{1}, θ_{2}^{(o l d)}) = \frac{f_{i} (y_{i}, x_{i}, Υ_{i}, m_{i}; θ_{1}, θ_{2}^{(o l d)})}{\sum_{m = 1}^{M} \int \int f_{i} (y_{i}, x_{i}, Υ_{i}, m_{i}; θ_{1}, θ_{2}^{(o l d)}) d Υ_{i}}

(18)

at the current iteration given the parameter values obtained in the previous iteration, $θ_{2}^{(o l d)}$ . The update of $θ_{2}$ is the vector that maximizes the expected value of the complete log-likelihood given the conditional distribution in Equationequation (18)(18) $f_{i} (Υ_{i}, m_{i} | y_{i}, x_{i}; θ_{1}, θ_{2}^{(o l d)}) = \frac{f_{i} (y_{i}, x_{i}, Υ_{i}, m_{i}; θ_{1}, θ_{2}^{(o l d)})}{\sum_{m = 1}^{M} \int \int f_{i} (y_{i}, x_{i}, Υ_{i}, m_{i}; θ_{1}, θ_{2}^{(o l d)}) d Υ_{i}}$ (18) . Throughout this section, $θ_{1}$ is fixed to the current estimate. Hence, for simplicity, $θ_{1}$ is dropped. Then, the objective function is

(19)

Q (θ_{2}, θ_{2}^{(o l d)}) = \sum_{i = 1}^{n} Q_{i} (θ_{2}, θ_{2}^{(o l d)})

(19)

= \sum_{i = 1}^{n} \frac{\sum_{m_{i} = 1}^{M} \int h_{i, m_{i}} (Υ_{i}; θ_{2}) f_{i} (y_{i}, x_{i}, Υ_{i}, m_{i}; θ_{2}^{(o l d)}) d Υ_{i}}{\sum_{m_{i} = 1}^{M} \int f_{i} (y_{i}, x_{i}, Υ_{i}, m_{i}; θ_{2}^{(o l d)}) d Υ_{i}} .

The maximum of (19) is attained at $θ_{2}^{(u p d)}$ and can be expressed in terms of the following integrals:

\int f_{i} (y_{i}, x_{i}, Υ_{i}, m; θ_{2}^{(o l d)}) d Υ_{i} \equiv ρ_{i, m}

\int ξ_{i} f_{i} (y_{i}, x_{i}, Υ_{i}, m; θ_{2}^{(o l d)}) d Υ_{i} \equiv ρ_{ξ; i, m}

\int η_{i} f_{i} (y_{i}, x_{i}, Υ_{i}, m; θ_{2}^{(o l d)}) d \Upsilona_{i} \equiv ρ_{η, : i, m}

\int \xia_{i} \xia_{i}^{T} f_{i} (y_{i}, x_{i}, Υ_{i}, m; θ_{2}^{(o l d)}) d Υ_{i} \equiv Δ_{ξ ξ; i, m}

\int η_{i} η_{i}^{T} f_{i} (y_{i}, x_{i}, Υ_{i}, m; θ_{2}^{(o l d)}) d \Upsilona_{i} \equiv Δ_{\etaa \etaa; i, m}

(20)

\int \xia_{i} η_{i}^{T} f_{i} (y_{i}, x_{i}, Υ_{i}, m; θ_{2}^{(o l d)}) d \Upsilona_{i} \equiv Δ_{ξ \etaa; i, m},

(20)

and the current estimate of the effective number of observations per component

{\hat{n}}^{(m)} = \sum_{i = 1}^{n} \frac{ρ_{i, m}}{\sum_{l = 1}^{M} ρ_{i, l}} .

Then the updated parameter vector $θ_{2}^{(u p d)}$ is composed of

τ_{(u p d)}^{(m)} = \frac{1}{{\hat{n}}^{(m)}} \sum_{i = 1}^{n} \frac{ρ_{\etaa, : i, m} - Γ_{(o l d)}^{(m)} ρ_{ξ; i, m}}{\sum_{l = 1}^{M} ρ_{i, l}}

Γ_{(u p d)}^{(m)} = [\sum_{i = 1}^{n} \frac{Δ_{ξ \etaa; i, m}^{T} - \tauu_{(o l d)}^{(m)} \rhoa_{ξ; i, m}^{T}}{\sum_{l = 1}^{M} ρ_{i, l}}] {[\sum_{i = 1}^{n} \frac{Δ_{ξ ξ; i, m}}{\sum_{l = 1}^{M} ρ_{i, l}}]}^{- 1}

Φ_{(u p d)}^{(m)} = \mua_{(o l d)}^{(m)} {(\mua_{(o l d)}^{(m)})}^{T} + \frac{1}{{\hat{n}}^{(m)}} [\sum_{i = 1}^{n} \frac{Δ_{ξ ξ; i, m} - \mua_{(o l d)}^{(m)} ρ_{ξ; i, m}^{T} - ρ_{ξ; i, m} {(\mua_{(o l d)}^{(m)})}^{T}}{\sum_{l = 1}^{M} ρ_{i, l}}]

\mua_{(u p d)}^{(m)} = \frac{1}{{\hat{n}}^{(m)}} \sum_{i = 1}^{n} \frac{ρ_{ξ; i, m}}{\sum_{l = 1}^{M} ρ_{i, l}}

π_{(u p d)}^{(m)} = \frac{{\hat{n}}^{(m)}}{n},

for $m = 1, \dots, M$ , and

Ψ_{(u p d)} = \frac{1}{n} \sum_{i = 1}^{n} \frac{\Omegaa}{\sum_{l = 1}^{M} ρ_{i, l}},

where

\Omegaa = \sum_{l = 1}^{M} Δ_{\etaa \etaa; i, l} - ρ_{\etaa, : i, l} {[τ_{(o l d)}^{(l)}]}^{T} - τ_{(o l d)}^{(l)} ρ_{\etaa, : i, l}^{T} - Δ_{ξ \etaa; i, l}^{T} {[Γ_{(o l d)}^{(l)}]}^{T} - Γ_{(o l d)}^{(l)} Δ_{ξ \etaa; i, l}

+ τ_{(o l d)}^{(l)} {[τ_{(o l d)}^{(l)}]}^{T} ρ_{i, m} + τ_{(o l d)}^{(l)} ρ_{ξ; i, l}^{T} {[Γ_{(o l d)}^{(l)}]}^{T} + Γ_{(o l d)}^{(l)} ρ_{ξ; i, l} {[τ_{(o l d)}^{(l)}]}^{T} + Γ_{(o l d)}^{(l)} Δ_{ξ ξ; i, l} {[Γ_{(o l d)}^{(l)}]}^{T} .

The integrals in (36) are intractable. For $ρ_{i, m}$ the AGHQ approximation is applied as in the previous section. For the other integrals, the same weights and quadrature points are used. Integrals should still be asymptotically well approximated since the original integrand is multiplied by a polynomial of degree at most 2, as argued by Naylor and Smith (Citation1982). Once the updated estimates are obtained the previous estimates $θ_{2}^{(o l d)}$ are replaced by the updated $θ_{2}^{(u p d)}$ and the procedure is repeated a number of times.

A Semiparametric Approach for Structural Equation Modeling with Ordinal Data

ABSTRACT

Introduction

The semiparametric model

Maximum likelihood estimation

Estimation

Restrictions

Simulations

Simulation study 1

Simulation design

Simulation results

Table 1. Parameter and corresponding average estimates using AGHQ approximation with 5 quadrature points per latent dimension and 1000 replications. RB, SD, and RMSE stand for relative bias, standard deviation, and root mean squared error, respectively. The sample size is $N = 1000$

Table 2. Parameter and corresponding average estimates using AGHQ approximation with 5 quadrature points per latent dimension and 1000 replications. RB, SD, and RMSE stand for relative bias, standard deviation, and root mean squared error, respectively. The sample size is $n = 500$

Simulation study 2

Simulation design

Simulation results

Discussion and conclusion

References

Integral approximation

Measurement Model Estimation

Structural Model Estimation

Information for

Open access

Opportunities

Help and information

A Semiparametric Approach for Structural Equation Modeling with Ordinal Data

ABSTRACT

Introduction

The semiparametric model

Maximum likelihood estimation

Estimation

Restrictions

Simulations

Simulation study 1

Simulation design

Simulation results

Table 1. Parameter and corresponding average estimates using AGHQ approximation with 5 quadrature points per latent dimension and 1000 replications. RB, SD, and RMSE stand for relative bias, standard deviation, and root mean squared error, respectively. The sample size is N=1000

Table 2. Parameter and corresponding average estimates using AGHQ approximation with 5 quadrature points per latent dimension and 1000 replications. RB, SD, and RMSE stand for relative bias, standard deviation, and root mean squared error, respectively. The sample size is n=500

Simulation study 2

Simulation design

Simulation results

Discussion and conclusion

Correction Statement

Additional information

Funding

Notes

References

Integral approximation

Measurement Model Estimation

Structural Model Estimation

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 1. Parameter and corresponding average estimates using AGHQ approximation with 5 quadrature points per latent dimension and 1000 replications. RB, SD, and RMSE stand for relative bias, standard deviation, and root mean squared error, respectively. The sample size is $N = 1000$

Table 2. Parameter and corresponding average estimates using AGHQ approximation with 5 quadrature points per latent dimension and 1000 replications. RB, SD, and RMSE stand for relative bias, standard deviation, and root mean squared error, respectively. The sample size is $n = 500$