Full article: A Marginal Maximum Likelihood Approach for Extended Quadratic Structural Equation Modeling with Ordinal Data

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

The literature on non-linear structural equation modeling is plentiful. Despite this fact, few studies consider interactions between exogenous and endogenous latent variables. Further, it is well known that treating ordinal data as continuous produces bias, a problem which is enhanced when non-linear relationships between latent variables are incorporated. A marginal maximum likelihood-based approach is proposed in order to fit a non-linear structural equation model including interactions between exogenous and endogenous latent variables in the presence of ordinal data. In this approach, the exact gradient of the approximated observed log-likelihood is calculated in order to attain the approximated maximum likelihood estimator. A simulation study shows that the proposed method provides estimates with low bias and accurate coverage probabilities.

KEYWORDS:

Introduction

Non-linear structural equation modeling (SEM) has been of great interest in the past few decades. Fruitful theories in psychology and business studies incorporate non-linear relations among latent variables. In the Theory of Planned Behavior, Ajzen (Citation1991) emphasized the relevance of interaction between Motivation and Ability on Behavior. In a study focusing on the consumers’ loyalty, Agustin and Singh (Citation2005) proposed a quadratic relationship between the latent constructs Satisfaction, Trust, Value, and Loyalty intentions in the framework of marketing sciences. In order to fit a non-linear SEM model, various methods have been proposed, which include but are not limited to the product indicator approaches (e.g., Kelava & Brandt, Citation2009; Marsh, Wen, & Hau, Citation2004; Wall & Amemiya, Citation2001; Yang-Jonsson, Citation1997), the method of moments (e.g., Mooijaart & Bentler, Citation2010; Wall & Amemiya, Citation2000), two-stage least squares (e.g., Bollen, Citation1995, Citation1996), the distributional analytic approaches (e.g., Klein & Moosbrugger, Citation2000; Klein & Muthén, Citation2007; Lee & Zhu, Citation2002), and the Bayesian methods (e.g., Lee & Song, Citation2004a; Zhu & Lee, Citation1999). The reader is directed to Brandt, Kelava, and Klein (Citation2014) and Harring, Weiss, and Hsu (Citation2012) for reviews and simulation studies comparing the aforementioned methods. Despite that the methods of fitting a non-linear SEM model are numerous, they are not flawless and do not always meet the needs of real data analysis.

First, most studies have focused on models with continuous indicators. In such cases, distributional assumptions are violated if indicators are ordinal. This has been investigated and observed extensively in linear models (e.g., Finney & DiStefano, Citation2006; Flora & Curran, Citation2004). Rhemtulla, Brosseau-Liard, and Savalei (Citation2012) showed that ordinal data cannot be treated as continuous, unless the number of categories is large and the categories are equidistant. Due to the complexity of the non-linear model, we do not expect such methods to work well in a non-linear SEM model with ordinal data either. Lee and Zhu (Citation2000) firstly considered a non-linear SEM model with the presence of polytomous indicators and proposed a Bayesian approach to estimate the parameters. Since then, various studies that directly handle the categorical nature emerge. Lee and Song (Citation2003) extended the model in Lee and Zhu (Citation2000) by incorporating covariates in the measurement and the structural model and proposed an MCEM algorithm (Wei & Tanner, Citation1990) to estimate the parameters. A similar algorithm was used by Song and Lee (Citation2005) when the non-linear model has only dichotomous indicators. Lee and Song (Citation2008) and Song and Lee (Citation2006a) considered Bayesian estimation and comparison of non-linear models with a mixture of continuous, dichotomous, and ordinal indicators. Rizopoulos and Moustaki (Citation2008) introduced non-linear terms of the latent variables to generalized latent variable models with covariates. When the non-linear model contains missing values, Lee and Song (Citation2004a) proposed a Bayesian approach for a mixture of continuous and ordinal indicators, which is extended by Cai, Song, and Lee (Citation2008) to model non-ignorable missing. Lee and Song (Citation2007) proposed a unified maximum likelihood approach for non-linear models with different types of indicators and with or without missing values. Lee, Song, and Cai (Citation2010) exemplified the use of the Bayesian approach in the non-linear models using dichotomous indicators as examples. Studies mentioned above are devoted to single group data analysis. It is further generalized by Song and Lee (Citation2006b) to multigroup analysis and by Kelava and Brandt (Citation2014), Lee and Song (Citation2004b), Lee et al. (Citation2009), and Song and Lee (Citation2004) to multilevel analysis.

Second, in the Theory of Planned Behavior, Ajzen (Citation1991) proposed that the effect of Intention (endogenous latent variable) on Behavior is affected by Perceived Behavioral Control (exogenous latent variable). Throughout the paper, an exogenous variable is a variable that is not caused by any other variables in the model and an endogenous variable is a variable that is caused by some variables in the model. Even though some methods can incorporate ordinal data into the model, they generally lack the interaction between endogenous latent variables and exogenous latent variables. In contrast, Agustin and Singh (Citation2005) proposed an interaction effect between Consumer Trust (endogenous latent variable) and Satisfaction (exogenous latent variable). They treated ordinal data as continuous and used the two-step approach of Ping (Citation1995). To our best knowledge, there are no contemporary methods that can estimate non-linear SEM models including interactions between exogenous and endogenous latent variables in the presence of ordinal indicators.

The main purpose of the current paper is to fit the non-linear SEM model with ordinal data in the presence of interactions between latent exogenous and latent endogenous variables. We propose a marginal maximum likelihood approach to estimate the fixed unknown parameters in the model. Instead of the commonly implemented expectation-maximization (EM, Dempster, Laird, & Rubin, Citation1977) algorithm, we propose to use direct maximization. As we shall explain in later sections, the direct maximum likelihood approach is based on the true gradient of the approximated observed log-likelihood, which makes the sandwich estimator of standard errors easier to obtain. In contrast, the sandwich estimator of standard errors is not directly available in the EM algorithm. Further, when the gradients in the M-step are approximated, the EM algorithm can be less accurate than the direct maximum likelihood approach with the same approximation method (Jin & Andersson, Citation2019a).

The article is organized as follows. The model is presented in the next section, which is followed by a description of the proposed method and an outline of the suggested procedure for directly maximizing the marginal likelihood. Next, a simulation study is conducted to investigate the small-sample performance of the proposed method. An empirical example is presented to illustrate the method, followed by a discussion with conclusions.

Non-linear structural equation model

Let $ξ$ (exogenous) and $η$ (endogenous) be latent random vectors with dimensions $K_{ξ}$ and $K_{η}$ , respectively. The measurement model for $X_{j}$ , the $j^{t h}$ ordinal indicator of $ξ$ , is:

(1)

g_{j}^{(x)} (P (X_{j} \leq u_{j} | ξ)) = α_{x, j}^{(u_{j})} - β_{x, j}^{T} ξ, u_{j} \in \{1, . . ., U_{j}\},

(1)

where $U_{j}$ is the number of response categories for the ordinal indicator $X_{j}$ , $g_{j}^{(x)}$ is the link function, $α_{x, j}^{(u_{j})}$ is the category-specific intercept for observing $u_{j}$ , and $β_{x, j}$ is the factor loading vector for $X_{j}$ . Such a setting is similar to the cumulative link model in the context of generalized linear model (Agresti, Citation2015, p. 213), but only with latent variables as explanatory variables. Likewise, the measurement model for $Y_{j}$ , the $j^{t h}$ ordinal indicator of $η$ , is:

(2)

g_{j}^{(y)} (P (Y_{j} \leq v_{j} | η)) = α_{y, j}^{(v_{j})} - β_{y, j}^{T} η, v_{j} \in \{1, . . ., V_{j}\},

(2)

where $V_{j}$ is the number of response categories for the ordinal indicator $Y_{j}$ , $g_{j}^{(y)}$ is the link function, $α_{y, j}^{(v_{j})}$ is the category-specific intercept for observing $v_{j}$ , and $β_{y, j}$ is the factor loading vector for $Y_{j}$ . $ξ$ is assumed to have a $0$ mean vector and a covariance matrix $Φ$ . Further, partition $η$ into a $K_{η_{1}} \times 1$ vector $η_{1}$ and a $K_{η_{2}} \times 1$ vector $η_{2}$ . The structural model of interest is:

(3)

η_{1} = τ_{1} + B_{11} η_{1} + B_{12} η_{2} + Γ_{1} ξ + {(I_{K_{η_{1}}} \otimes ξ)}^{T} Ω_{1} ξ + {(I_{K_{η_{1}}} \otimes ξ)}^{T} Π η_{2} + {(I_{K_{η_{1}}} \otimes η_{2})}^{T} Ξ η_{2} + ζ_{1},

(3)

(4)

η_{2} = τ_{2} + B_{22} η_{2} + Γ_{2} ξ + {(I_{K_{η_{2}}} \otimes ξ)}^{T} Ω_{2} ξ + ζ_{2},

(4)

where $I_{a}$ is an $a \times a$ identity matrix and $\otimes$ denotes the Kronecker product. The $B$ matrices represent linear effects among endogenous latent variables, where $B_{11}$ and $B_{22}$ are $K_{η_{1}} \times K_{η_{1}}$ and $K_{η_{2}} \times K_{η_{2}}$ matrices, respectively, with zero diagonal elements, and $B_{12}$ is a $K_{η_{1}} \times K_{η_{2}}$ matrix. The $Ω$ matrices represent non-linear effects of exogenous latent variables on the endogenous latent variables, where $Ω_{1}$ and $Ω_{2}$ are $K_{η_{1}} K_{ξ} \times K_{ξ}$ and $K_{η_{2}} K_{ξ} \times K_{ξ}$ matrices, respectively. $Π$ is a $K_{η_{1}} K_{ξ} \times K_{η_{2}}$ matrix representing interaction effects between endogenous and exogenous latent variables, and $Ξ$ is a $K_{η_{1}} K_{η_{2}} \times K_{η_{2}}$ matrix, representing non-linear effects among $η_{2}$ on $η_{1}$ . The matrices $Ω_{1}$ , $Ξ$ , and $Ω_{2}$ are block matrices stacking upper-triangular matrices on top of one another. For example, if $K_{η_{1}} = K_{η_{2}} = K_{ξ} = 2$ , $Ω_{1}$ , $Ω_{2}$ , and $Ξ$ consist of two upper-triangular matrices each, i.e.

The diagonal elements of the upper-triangular matrices correspond to the quadratic effects, whereas the off-diagonal elements correspond to the interaction effects. The error terms $\zetaa_{1}$ and $\zetaa_{2}$ are assumed to have zero mean vectors and are independent of each other with $K_{η_{1}} \times K_{η_{1}}$ and $K_{η_{2}} \times K_{η_{2}}$ covariance matrices $Ψ_{11}$ and $Ψ_{22}$ , respectively. $τ_{1}$ and $τ_{2}$ are the intercepts. The intercept $τ_{2}$ is determined such that $E (η_{2}) = 0$ , that is, the $j^{t h}$ element of $τ_{2}$ is defined by:

τ_{2, j} = - t r a c e (Ω_{j}^{(2)} Φ),

where $Ω_{j}^{(2)}$ is the $j^{t h}$ upper-triangular matrix in $Ω_{2}$ . For simplicity, we let $τ_{1} = 0$ . Thus, $E (η_{1})$ is not zero.

To our knowledge, the majority of studies on non-linear SEM overlooked Equation (3) and focused on EquationEquation (4)(4) $η_{2} = τ_{2} + B_{22} η_{2} + Γ_{2} ξ + {(I_{K_{η_{2}}} \otimes ξ)}^{T} Ω_{2} ξ + ζ_{2},$ (4) . Take the path diagram of an extended quadratic structural model in as an example. If Equation (3) is ignored, the researchers are not able to model the paths in the dashed rectangle. As mentioned previously, the non-linear effects among endogenous and exogenous latent variables are commonly encountered in practice. Hence, it is of importance to include Equation (3) in the structural model.

Figure 1. The path diagram of the structural model of an extended quadratic structural equation model. The dashed paths correspond to the effects in Equation (3) and the solid paths correspond to the effects in EquationEquation (4)(4) $η_{2} = τ_{2} + B_{22} η_{2} + Γ_{2} ξ + {(I_{K_{η_{2}}} \otimes ξ)}^{T} Ω_{2} ξ + ζ_{2},$ (4)

Marginal maximum likelihood estimation

Under the assumption of independent observations, the complete likelihood function is:

\prod_{i = 1}^{n} f (y_{i}, x_{i}, η_{i}, ξ_{i}),

where $n$ is the number of observations, and $y_{i}$ and $x_{i}$ are the $i^{t h}$ observed ordinal vectors of indicators. Under the conditional independence assumption, the complete likelihood becomes:

(5)

\prod_{i = 1}^{n} f (y_{i}, x_{i}, η_{i}, ξ_{i}) = \prod_{i = 1}^{n} f (y_{i} | η_{i}) f (x_{i} | ξ_{i}) f (η_{1, i} | η_{2, i}, ξ_{i}) f (η_{2, i} | ξ_{i}) f (ξ_{i}) .

(5)

We assume that $ξ \sim N (0, Φ)$ , $ζ_{1} \sim N (0, Ψ_{11})$ , and $ζ_{2} \sim N (0, Ψ_{22})$ . Hence, $η_{1, i} | (η_{2, i}, ξ_{i})$ , $η_{2, i} | ξ_{i}$ , and $ξ_{i}$ are multivariate normal. The expression of $f (y_{i}, x_{i}, η_{i}, ξ_{i})$ under these normal assumptions can be found in the appendix.

In practice, $η_{1}$ is often a scalar that has only one binary indicator. For example, $y_{1} = 1$ if a behavior is conducted and 0 if not. If that is the case, further restrictions are needed for identification. One way is to let $Ψ_{11} = 0$ and $β_{y, 1} = 1$ . This means that $Ψ_{11}$ is singular and $η_{1, i} | (η_{2, i}, ξ_{i})$ is a one-point distribution. Equivalently, $η_{1}$ is completely determined conditioned on $ξ$ and $η_{2}$ .

Approximated observed log-likelihood

Let $θ$ be the vector of parameters that includes the unconstrained elements in the structural matrices in $B_{11}$ , $B_{12}$ , $B_{22}$ , $Γ_{1}$ , $Γ_{2}$ , $Ω_{1}$ , $Ω_{2}$ , $Π$ and $Ξ$ , the unconstrained elements in the covariance matrices $Φ$ , $Ψ_{11}$ , and $Ψ_{22}$ , and the unconstrained intercepts and factor loadings $α_{y, j}^{(v_{j})}$ , $α_{x, j}^{(u_{j})}$ , $β_{y, j}$ , and $β_{x, j}$ . In order to obtain the maximum likelihood estimator of $θ$ , the observed likelihood function is calculated by integrating out $Υ_{i} = (η_{i}^{T}, ξ_{i}^{T})^{T}$ with $η_{i} = (η_{1, i}^{T}, η_{2, i}^{T})^{T}$ from EquationEquation (5)(5) $\prod_{i = 1}^{n} f (y_{i}, x_{i}, η_{i}, ξ_{i}) = \prod_{i = 1}^{n} f (y_{i} | η_{i}) f (x_{i} | ξ_{i}) f (η_{1, i} | η_{2, i}, ξ_{i}) f (η_{2, i} | ξ_{i}) f (ξ_{i}) .$ (5) , i.e.

(6)

ℓ (θ) = \sum_{i = 1}^{n} log \int f (y_{i}, x_{i}, Υ_{i}; θ) d Υ_{i} .

(6)

The integral in EquationEquation (6)(6) $ℓ (θ) = \sum_{i = 1}^{n} log \int f (y_{i}, x_{i}, Υ_{i}; θ) d Υ_{i} .$ (6) is intractable and approximations are needed. The approximated maximum likelihood estimator is obtained by maximizing the approximated observed log-likelihood function. Commonly used numerical approximation methods include the Laplace approximation (e.g., Huber, Ronchetti, & Victoria-Feser, Citation2004) and the Gauss–Hermitequadrature-based methods (e.g., Moustaki, Citation1996; Moustaki & Knott, Citation2000; Rabe-Hesketh & Skrondal, Citation2004; Rizopoulos & Moustaki, Citation2008). Alternatively, the observed gradient can be approximated by stochastic methods (e.g., Cai, Citation2010). In the present study, the second-order Laplace approximation (Shun & McCullagh, Citation1995) and the adaptive Gauss–Hermite quadrature (AGHQ) approximation of Liu and Pierce (Citation1994) are implemented. The second-order Laplace approximation is second-order accurate and the accuracy of the AGHQ approximation depends on the number of quadrature points. A general introduction of these methods and the expressions of the approximations to EquationEquation (6)(6) $ℓ (θ) = \sum_{i = 1}^{n} log \int f (y_{i}, x_{i}, Υ_{i}; θ) d Υ_{i} .$ (6) can be found in the appendix.

Maximizing the approximated observed log-likelihood

A common approach in latent variable models is to employ the EM algorithm or its variants, where the latent variables are treated as missing values and the gradient of the conditional expectation in the E-step is approximated in the M-step. For example, the EM algorithm is used by Lee and Song (Citation2004b), Lee, Song, and Lee (Citation2003), and Lee and Zhu (Citation2000) among others. The EM algorithm is known for its slow speed in many situations. When the integrals in the M-step are approximated, various studies (e.g., Bianconcini & Cagnone, Citation2012; Rizopoulos, Verbeke, & Lesaffre, Citation2009; Steele, Citation1996) argued that the fully exponential Laplace approximation (Tierney & Kadane, Citation1986; Tierney, Kass, & Kadane, Citation1989) is more accurate than the Laplace approximation. Jin and Andersson (Citation2019a) proved that the estimator based on the EM algorithm with the fully exponential Laplace approximation is the same as the estimator that maximizes the Laplace approximated observed log-likelihood function, if implemented properly. Hence, we propose to directly maximize the approximated $ℓ (θ)$ , instead of the EM algorithm.

The gradients of the approximations can be readily obtained by the chain rule, explained in detail in the appendix. A quasi-Newton method (e.g., BFGS algorithm) can be used to find the maximizer of the approximated observed log-likelihood function. Since the approximations are subject to approximation error, the proposed approach can be interpreted as the quasi-maximum likelihood approach (Huber et al., Citation2004). If the approximated log-likelihood function is directly maximized, the gradient of the approximation is already available, which makes the standard errors easy to estimate (see EquationEquation (10)(10) ${(\hat{\frac{\partial^{2} ℓ (θ)}{\partial θ \partial θ^{T}}})}^{- 1} \sum_{i = 1}^{n} [\frac{\partial ℓ_{i} (\hat{θ})}{\partial θ} \frac{\partial ℓ_{i} (\hat{θ})}{\partial θ^{T}}] {(\hat{\frac{\partial^{2} ℓ (θ)}{\partial θ \partial θ^{T}}})}^{- 1},$ (10) in the appendix). If the EM algorithm is used, the information matrix needed for standard errors is approximated afterward by, for example, the Louis method (Louis, Citation1982).

Simulation study

In order to investigate the performance of the suggested approach, a simulation study is conducted. The setup and result are presented in the following subsections.

Simulation design

In the simulation, a four-factor model is considered with two exogenous latent variables $\{ξ_{1}, ξ_{2}\}$ and two endogenous latent variables $\{η_{1}, η_{2}\}$ . Each latent variable is associated with three ordinal indicators with five categories each. $ξ$ , $ζ_{1}$ and $ζ_{2}$ are generated from multivariate normal distributions. The population parameters of the measurement model (EquationEquations (1)(1) $g_{j}^{(x)} (P (X_{j} \leq u_{j} | ξ)) = α_{x, j}^{(u_{j})} - β_{x, j}^{T} ξ, u_{j} \in \{1, . . ., U_{j}\},$ (1) and (Equation2(2) $g_{j}^{(y)} (P (Y_{j} \leq v_{j} | η)) = α_{y, j}^{(v_{j})} - β_{y, j}^{T} η, v_{j} \in \{1, . . ., V_{j}\},$ (2) )) are

\begin{matrix} α_{x}^{T} = (\begin{matrix} - 2.32 & - 2.13 & - 1.96 & - 2.32 & - 2.13 & - 1.96 \\ 0.57 & 0.52 & 0.48 & 0.57 & 0.52 & 0.48 \\ 1.51 & 1.39 & 1.27 & 1.51 & 1.39 & 1.27 \\ 2.87 & 2.64 & 2.42 & 2.87 & 2.64 & 2.42 \end{matrix}) \\ β_{x}^{T} = (\begin{matrix} 2.00 & 1.80 & 1.60 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2.00 & 1.80 & 1.60 \end{matrix}) \\ α_{y}^{T} = (\begin{matrix} - 1.76 & - 1.70 & - 1.66 & - 2.20 & - 2.14 & - 2.08 \\ - 0.42 & - 0.40 & - 0.35 & - 0.91 & - 0.88 & - 0.87 \\ 0.60 & 0.56 & 0.52 & 0.00 & 0.00 & 0.00 \\ 2.65 & 2.60 & 2.50 & 1.79 & 1.76 & 1.72 \end{matrix}) \\ β_{y}^{T} = (\begin{matrix} 1.00 & 0.95 & 0.90 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1.00 & 0.95 & 0.90 \end{matrix}), \end{matrix}

where zero elements are restricted to zero. Here the $(j, k)^{t h}$ element in $α_{x}$ is $α_{x, j}^{(k)}$ and the $j^{t h}$ row of $β_{x}$ is $β_{x, j}$ in EquationEquation (1)(1) $g_{j}^{(x)} (P (X_{j} \leq u_{j} | ξ)) = α_{x, j}^{(u_{j})} - β_{x, j}^{T} ξ, u_{j} \in \{1, . . ., U_{j}\},$ (1) . Likewise, the $(j, k)^{t h}$ element in $α_{y}$ is $α_{y, j}^{(k)}$ and the $j^{t h}$ row of $β_{y}$ is $β_{y, j}$ in EquationEquation (2)(2) $g_{j}^{(y)} (P (Y_{j} \leq v_{j} | η)) = α_{y, j}^{(v_{j})} - β_{y, j}^{T} η, v_{j} \in \{1, . . ., V_{j}\},$ (2) . The factor loadings only load on one latent variable giving this setup a simple structure for simplicity. $β_{y, 11}$ and $β_{y, 42}$ are fixed to $1$ for identification. The population covariance between the latent exogenous variables is 0.40 and the variances are fixed to 1 for identification. Ordinal observations are generated by the measurement model using the probit link function. EquationEquations (3)(4) $η_{2} = τ_{2} + B_{22} η_{2} + Γ_{2} ξ + {(I_{K_{η_{2}}} \otimes ξ)}^{T} Ω_{2} ξ + ζ_{2},$ (4) and (Equation4(5) $\prod_{i = 1}^{n} f (y_{i}, x_{i}, η_{i}, ξ_{i}) = \prod_{i = 1}^{n} f (y_{i} | η_{i}) f (x_{i} | ξ_{i}) f (η_{1, i} | η_{2, i}, ξ_{i}) f (η_{2, i} | ξ_{i}) f (ξ_{i}) .$ (5) ) are set to be the same as . The population values of the structural parameters are

\begin{aligned} B_{11} = 0, B_{12} = 0.15, Γ_{1} = (\begin{matrix} 0.15 & 0.15 \end{matrix}), \\ Ω_{1} = (\begin{matrix} 0.15 & 0.20 \\ 0 & 0.15 \end{matrix}), Π = (\begin{matrix} 0.10 \\ 0.10 \end{matrix}), Ξ = 0.1, \end{aligned}

B_{22} = 0, Γ_{2} = (\begin{matrix} 0.35 & 0.35 \end{matrix}), Ω_{2} = (\begin{matrix} 0.20 & 0.25 \\ 0 & 0.20 \end{matrix}),

where the zero elements are restricted to zero. The variances of the error terms are $Ψ_{11} = 2.00$ and $Ψ_{22} = 1.50$ . Such population values are chosen to reflect common values in practice. The $α_{x}$ and $β_{x}$ are chosen such that the reliabilities of the items range from 0.72 to 0.8 and the probabilities of observing each outcomes are 0.15, 0.45, 0.15, 0.15, and 0.1, respectively, for all items. The structural parameters are chosen such that the $R^{2}$ ‘s of the latent regression are approximately $0.43$ and $0.34$ . The $α_{y}$ and $β_{y}$ are chosen such that the reliabilities of the items range from 0.65 to 0.78 and the probabilities of observing each outcomes are approximately 0.10, 0.20, 0.20, 0.35, and 0.15 for all items. Four different sample sizes are considered, i.e., $n = \{400, 600, 800, 1000\}$ and 2000 replications are generated for each sample size. Two second-order accurate approximation methods to the observed log-likelihood are considered in the simulation, namely, the second-order Laplace approximation, denoted by Lap(2nd), and the AGHQ with 5 quadrature points per latent variable, denoted by AGHQ(5p). We expect them to perform similarly in terms of bias. The approximated maximum likelihood estimator is obtained by maximizing the approximated observed log-likelihood using the direct maximum likelihood approach by the BFGS algorithm. The algorithm is stopped when the maximum absolute change in the parameter estimates of two consecutive iterations is less than $10^{- 4}$ . The code was written in R (R Core Team, Citation2018) by the aid of the package Rcpp (Eddelbuettel, Citation2013).

Simulation results

Replications which do not converge and those with nonpositive definite covariance matrices are excluded from the analysis. The percentage of excluded cases is reported in . The proportion of inadmissible cases is generally ignorable and decreases as the sample size increases.

Table 1. Percentage of inadmissible cases excluded from the analysis

Display Table

In order to investigate the performance of the methods the relative bias $R B_{p} = 100 R^{- 1} \sum_{i = 1}^{R} \thetaa_{p}^{- 1} ({\hat{\thetaa}}_{p, i} - \thetaa_{p})$ and root mean squared error $R M S E_{p} = \sqrt{R^{- 1} \sum_{i = 1}^{R} {[{\hat{\thetaa}}_{p, i} - \thetaa_{p}]}^{2}}$ are calculated for each structural parameter $\thetaa_{p}$ , where $R$ is the number of replications and ${\hat{\thetaa}}_{p, i}$ is the corresponding estimate at the $i^{t h}$ replication. shows the RB and RMSE for the structural parameters $B_{11}$ , $B_{12}$ , $B_{22}$ , $Γ_{1}$ , $Γ_{2}$ , $Ω_{1}$ , $Ω_{2}$ , $Π$ and $Ξ$ . Following the common practice in the SEM literature (e.g., Curran, West, & Finch, Citation1996; Flora & Curran, Citation2004), relative bias lower than $5$ is regarded as a low bias. Generally, the methods provide low biases for the sample sizes investigated. The RMSE decreases with sample size as expected. Lap(2nd) and AGHQ(5p) perform similarly in terms of RB and RMSE across parameters and sample sizes. In order to evaluate the performances of the calculated standard errors, coverage probabilities at the nominal level 95% are presented in . The coverage probabilities of covering the true values are generally close to the nominal level. Results not shown here indicate that the first-order Laplace approximation that is equivalent to the adaptive Gauss–Hermite quadrature approximation with 1 quadrature point can perform unacceptably poorly for non-linear terms.

Table 2. Relative bias and root mean squared error of the estimators of structural parameters in EquationEquations (3)(4) $η_{2} = τ_{2} + B_{22} η_{2} + Γ_{2} ξ + {(I_{K_{η_{2}}} \otimes ξ)}^{T} Ω_{2} ξ + ζ_{2},$ (4) and (Equation4(5) $\prod_{i = 1}^{n} f (y_{i}, x_{i}, η_{i}, ξ_{i}) = \prod_{i = 1}^{n} f (y_{i} | η_{i}) f (x_{i} | ξ_{i}) f (η_{1, i} | η_{2, i}, ξ_{i}) f (η_{2, i} | ξ_{i}) f (ξ_{i}) .$ (5) )

Display Table

Table 3. Coverage probabilities (%) of the Wald confidence interval for structural parameters in EquationEquations (3)(4) $η_{2} = τ_{2} + B_{22} η_{2} + Γ_{2} ξ + {(I_{K_{η_{2}}} \otimes ξ)}^{T} Ω_{2} ξ + ζ_{2},$ (4) and (Equation4(5) $\prod_{i = 1}^{n} f (y_{i}, x_{i}, η_{i}, ξ_{i}) = \prod_{i = 1}^{n} f (y_{i} | η_{i}) f (x_{i} | ξ_{i}) f (η_{1, i} | η_{2, i}, ξ_{i}) f (η_{2, i} | ξ_{i}) f (ξ_{i}) .$ (5) ) at the nominal level of 95%

Display Table

The Laplace approximations are expected to be computationally more efficient than AGHQ approximations for similar error rates if the number of latent variables is large. However, a higher-order Laplace approximation requires calculations of higher-order derivatives which are not straightforward, whereas the AGHQ approximation provides the possibility of decreasing error rates by increasing the number of quadrature points with the drawback of being computationally heavier. In the current simulation, the average estimation time for the sample size $n = 1000$ takes on average 90 s for AGHQ(5p) and 49 s for Lap(2nd). This relative difference is hypothesized to increase with increasing number of latent variables.

Empirical example

In order to demonstrate how the proposed method works in practice, an empirical example is considered in this section. The empirical example contains 707 complete responses of households in two Swedish municipalities Sollentuna and Saltjö-boo regarding their behavior of off-peak hour energy use (Bartusch, Juslin, Stitkvoort, Yang-Wallentin, & Öhrlund, Citation2019), in order to investigate the roles of the psychological factors, Attitude ( $ξ_{1}$ ), Belief ( $ξ_{2}$ ), Perceived Behavioral Control ( $ξ_{3}$ ) and Intentions ( $η_{2}$ ) in relation to the Behavior ( $η_{1}$ ) of shifting the energy use to off-peak hours. It is hypothesized that Behavior ( $η_{1}$ ) is affected by an interaction between Perceived Behavioral Control ( $ξ_{3}$ ) and Intentions ( $η_{2}$ ). $η_{1}$ has only one continuous indicator and it is dichotomized into a binary indicator. Other latent variables are associated with two indicators each, in the form of 7-point Likert scale items. The 7-point Likert scale items were aggregated into five categories in order to avoid categories with too few observations. For the purpose of modeling the interaction effects, we also include $ξ_{3}^{2}$ and $η_{2}^{2}$ in the model, as suggested by Kelava and Brandt (Citation2009). The structural model of interest is then:

η_{1} = B_{12} η_{2} + (\begin{matrix} 0 & 0 & Γ_{13}^{(1)} \end{matrix}) ξ + ξ^{T} (\begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & Ω_{33}^{(1)} \end{matrix}) ξ + ξ^{T} (\begin{matrix} 0 \\ 0 \\ Π_{31} \end{matrix}) η_{2} + Ξ_{11} η_{2}^{2},

η_{2} = τ_{2} + (\begin{matrix} Γ_{11}^{(2)} & Γ_{12}^{(2)} & Γ_{13}^{(2)} \end{matrix}) ξ + ξ^{T} (\begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}) ξ + ζ_{2} .

shows the estimates and the 95% confidence intervals for the structural parameters using the AGHQ approximation with 5 quadrature points per latent variable and the second-order Laplace approximation. None of the non-linear parameters $Ω_{33}^{(1)}$ , ${⁋I}_{31}$ and $Ξ_{11}$ is significant at the 5% significance level. The linear effects of Attitude and Belief on Intention ( $Γ_{11}^{(2)}$ and $Γ_{12}^{(2)}$ ) are significant at the 5% significance level, whereas the linear effects of Perceived Behavioral Control on Intention and Behavior ( $Γ_{13}^{(2)}$ and $Γ_{13}^{(1)}$ ) and of Intention on Behavior ( $B_{12}$ ), are not significant at the 5% significance level. Results not presented here, show that similar results are obtained when 7 quadrature points are used, which indicates that the approximation is stabilized.

Table 4. Point estimates and 95% confidence intervals for structural parameters in the empirical example corresponding to non-linear effects for the AGHQ approximation with 5-quadrature points, denoted by AGHQ(5p), and second-order Laplace approximation, denoted by Lap(2nd)

Display Table

Discussion and conclusion

In this article, a structural model with quadratic and interaction relations is considered in the presence of ordinal data. Structural models including interactions between exogenous and endogenous latent variables are rarely found in the literature. The current study provides the possibility to include such features. To this end, a marginal maximum likelihood approach is proposed, where the exact gradient of the approximated observed log-likelihood is computed and used to approximate the Hessian, providing the opportunity to obtain standard errors straightforwardly, which are not easily available using, for example, the EM algorithm.

Simulation results indicate that the second-order Laplace approximation and the AGHQ approximation with 5 quadrature points per latent variable provide accurate parameter estimators at finite sample sizes. On one hand, despite that only simply structures are considered in the current study, the proposed approach works for general loading structures, provided that the model is identified. On the other hand, more research is needed to provide a rule-of-thumb sample size requirement for non-linear models with interactions between endogenous and exogenous variables. We leave it as a future topic.

In this study, the Laplace approximation and the AGHQ approximation are used to approximate the log-likelihood function, where the quadrature nodes are chosen deterministically. Alternative approaches include the Monte Carlo-based maximum likelihood approaches (e.g., Lee & Song, Citation2003; Song & Lee, Citation2005) and the Bayesian approaches (e.g., Lee & Song, Citation2008; Lee & Zhu, Citation2000). To our knowledge, they are not yet implemented in the non-linear SEM model of our interest. This remains as a topic for future studies.

A drawback of the method is that it relies on normally distributed latent exogenous variables and error terms. It remains for future research to investigate the sensitivity against nonnormality and develop robust approaches. For example, non-normality in the latent exogenous variables could be modeled by mixtures of normals.

Acknowledgment

We thank the reviewers for useful comments that lead to an improved manuscript.

Additional information

Funding

The work was supported by the Swedish Research Council [contract number 2017-01175].

References

Agresti, A. (2015). Foundations of linear and generalized linear models. Hoboken, NJ: John Wiley & Sons.
Google Scholar
Agustin, C., & Singh, J. (2005). Curvilinear effects of consumer loyalty determinants in relational exchanges. Journal of Marketing Research, 42, 96–108. doi:10.1509/jmkr.42.1.96.56961
Web of Science ®Google Scholar
Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50, 179–211. doi:10.1016/0749-5978(91)90020-T
Web of Science ®Google Scholar
Barndorff-Nielsen, O., & Cox, D. (1989). Asymptotic techniques for use in statistics. London, UK: Chapman and Hall.
Google Scholar
Bartusch, C., Juslin, P., Stitkvoort, B., Yang-Wallentin, F., & Öhrlund, I. (2019). The black box of demand response (Unpublished manuscript). Department of Engineering Sciences, Uppsala University, Uppsala, Sweden.
Google Scholar
Bianconcini, S. (2014). Asymptotic properties of adaptive maximum likelihood estimators in latent variable models. Bernoulli, 20, 1507–1531. doi:10.3150/13-BEJ531
Web of Science ®Google Scholar
Bianconcini, S., & Cagnone, S. (2012). Estimation of generalized linear latent variable models via full exponential Laplace approximation. Journal of Multivariate Analysis, 112, 183–193. doi:10.1016/j.jmva.2012.06.005
Web of Science ®Google Scholar
Bollen, K. A. (1995). Structural equation models that are nonlinear in latent variables: A least-squares estimator. Sociological Methodology, 25, 223–251. doi:10.2307/271068
Web of Science ®Google Scholar
Bollen, K. A. (1996). An alternative two stage least squares (2SLS) estimator for latent variable equations. Psychometrika, 61, 109–121. doi:10.1007/BF02296961
Web of Science ®Google Scholar
Brandt, H., Kelava, A., & Klein, A. (2014). A simulation study comparing recent approaches for the estimation of nonlinear effects in sem under the condition of nonnormality. Structural Equation Modeling, 21, 181–195. doi:10.1080/10705511.2014.882660
Web of Science ®Google Scholar
Cai, J. H., Song, X. Y., & Lee, S. Y. (2008). Bayesian analysis of nonlinear structural equation models with mixed continuous, ordered and unordered categorical, and nonignorable missing data. Statistics and Its Interface, 1, 99–114. doi:10.4310/SII.2008.v1.n1.a9
Web of Science ®Google Scholar
Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis-Hastings Robbins-Monro algorithm. Psychometrika, 75, 33–57. doi:10.1007/s11336-009-9136-x
Web of Science ®Google Scholar
Core Team, R. (2018). R: A language and environment for statistical computing. Vienna, AU: R Foundation for Statistical Computing.
Google Scholar
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1, 16–29. doi:10.1037/1082-989X.1.1.16
Web of Science ®Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39, 1–38. doi:10.1111/rssb.1977.39.issue-1
Web of Science ®Google Scholar
Eddelbuettel, D. (2013). Seamless R and C++ integration with Rcpp. New York, NY: Springer.
Google Scholar
Finney, S. J., & DiStefano, C. (2006). Structural equation modeling: A second course. Charlotte, NC: Information Age Publishing.
Google Scholar
Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9, 446–491. doi:10.1037/1082-989X.9.4.466
PubMed Web of Science ®Google Scholar
Harring, J. R., Weiss, B. A., & Hsu, J.-C. (2012). A comparison of methods for estimating quadratic effects in nonlinear structural equation models. Psychological Methods, 17, 193–214. doi:10.1037/a0027539
PubMed Web of Science ®Google Scholar
Huber, P., Ronchetti, E., & Victoria-Feser, M.-P. (2004). Estimation of generalized linear latent variable models. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 66, 893–908. doi:10.1111/j.1467-9868.2004.05627.x
Web of Science ®Google Scholar
Jin, S., & Andersson, B. (2019a). Gains of fully exponential Laplace approximation in latent variable models (Unpublished manuscript). Department of Statistics, Uppsala University, Uppsala, Sweden.
Google Scholar
Jin, S., & Andersson, B. (2019b). A note on the accuracy of adaptive Gauss-Hermite quadrature. Accepted by Biometrika.
Google Scholar
Jin, S., Noh, M., & Lee, Y. (2018). H-likelihood approach to factor analysis for ordinal data. Structural Equation Modeling, 25, 530–540. doi:10.1080/10705511.2017.1403287
Web of Science ®Google Scholar
Kelava, A., & Brandt, H. (2009). Estimation of nonlinear latent structural equation models using the extended unconstrained approach. Review of Psychology, 16, 123–132.
Google Scholar
Kelava, A., & Brandt, H. (2014). A general non-linear multilevel structural equation mixture model. Frontiers in Psychology, 5, 748. doi:10.3389/fpsyg.2014.00748
PubMed Web of Science ®Google Scholar
Klein, A., & Moosbrugger, H. (2000). Maximum likelihood estimation of latent interaction effects with the LMS method. Psychometrika, 65, 457–474. doi:10.1007/BF02296338
Web of Science ®Google Scholar
Klein, A., & Muthén, B. (2007). Quasi-maximum likelihood estimation of structural equation models with multiple interaction and quadratic effects. Multivariate Behavioral Research, 42, 647–673. doi:10.1080/00273170701710205
Web of Science ®Google Scholar
Lee, S. Y., & Song, X. Y. (2003). Maximum likelihood estimation and model comparison of nonlinear structural equation models with continuous and polytomous variables. Computational Statistics & Data Analysis, 44, 125–142. doi:10.1016/S0167-9473(02)00305-5
Web of Science ®Google Scholar
Lee, S. Y., Song, X. Y., Cai, J. H., So, W. Y., Ma, C. W., & Chan, C. N. J. (2009). Non-linear structural equation models with correlated continuous and discrete data. British Journal of Mathematical and Statistical Psychology, 62, 327–347. doi:10.1348/000711008X292343
PubMed Web of Science ®Google Scholar
Lee, S. Y., & Zhu, H. T. (2002). Maximum likelihood estimation of nonlinear structural equation models. Psychometrika, 67, 189–210. doi:10.1007/BF02294842
Web of Science ®Google Scholar
Lee, S.-Y., & Song, X.-Y. (2004a). Bayesian model comparison of nonlinear structural equation models with missing continuous and ordinal categorical data. British Journal of Mathematical and Statistical Psychology, 57, 131–150. doi:10.1348/000711004849204
PubMed Web of Science ®Google Scholar
Lee, S.-Y., & Song, X.-Y. (2004b). Maximum likelihood analysis of a general latent variable model with hierarchical mixed data. Biometrics, 60, 624–636. doi:10.1111/j.0006-341X.2004.00211.x
PubMed Web of Science ®Google Scholar
Lee, S.-Y., & Song, X.-Y. (2007). A unified maximum likelihood approach for analyzing structural equation models with missing nonstandard data. Sociological Methods & Research, 35, 352–381. doi:10.1177/0049124106292357
Web of Science ®Google Scholar
Lee, S.-Y., & Song, X.-Y. (2008). On Bayesian estimation and model comparison of an integrated structural equation model. Computational Statistics and Data Analysis, 52, 4814–4827. doi:10.1016/j.csda.2008.03.029
Web of Science ®Google Scholar
Lee, S.-Y., Song, X.-Y., & Cai, J.-H. (2010). A bayesian approach for nonlinear structural equation models with dichotomous variables using logit and probit links. Structural Equation Modeling, 17, 280–302. doi:10.1080/10705511003659425
Web of Science ®Google Scholar
Lee, S.-Y., Song, X.-Y., & Lee, J. C. (2003). Maximum likelihood estimation of nonlinear structural equation models with ignorable missing data. Journal of Educational and Behavioral Statistics, 28, 111–134. doi:10.3102/10769986028002111
Web of Science ®Google Scholar
Lee, S.-Y., & Zhu, H.-T. (2000). Statistical analysis of nonlinear structural equation models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology, 53, 209–232. doi:10.1348/000711000159303
PubMed Web of Science ®Google Scholar
Lee, Y., & Nelder, J. A. (2001). Hierarchical generalised linear models: A synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika, 88, 987–1006. doi:10.1093/biomet/88.4.987
Web of Science ®Google Scholar
Liu, Q., & Pierce, D. A. (1994). A note on Gauss-Hermite quadrature. Biometrika, 81, 624–629.
Web of Science ®Google Scholar
Louis, T. A. (1982). Finding the observed information matrix when using the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 44, 226–233. doi:10.1111/rssb.1982.44.issue-2
Web of Science ®Google Scholar
Marsh, H. W., Wen, Z., & Hau, K.-T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9, 275–300. doi:10.1037/1082-989X.9.3.275
PubMed Web of Science ®Google Scholar
Mooijaart, A., & Bentler, P. M. (2010). An alternative approach for nonlinear latent variable models. Structural Equation Modeling, 17, 357–373. doi:10.1080/10705511.2010.488997
Web of Science ®Google Scholar
Moustaki, I. (1996). A latent trait and a latent class model for mixed observed variables. British Journal of Mathematical and Statistical Psychology, 49, 313–334. doi:10.1111/j.2044-8317.1996.tb01091.x
Web of Science ®Google Scholar
Moustaki, I., & Knott, M. (2000). Generalized latent trait models. Psychometrika, 65, 391–411. doi:10.1007/BF02296153
Web of Science ®Google Scholar
Ping, R. A. (1995). A parsimonious estimating technique for interaction and quadratic latent variables. Journal of Marketing Research, 32, 336–347. doi:10.1177/002224379503200308
Web of Science ®Google Scholar
Rabe-Hesketh, S., & Skrondal, A. (2004). Generalized multilevel structural equation modeling. Psychometrika, 69, 167–190. doi:10.1007/BF02295939
Web of Science ®Google Scholar
Rhemtulla, M., Brosseau-Liard, P. E., & Savalei, V. (2012). When can categorical variables be treated as continuous? a comparison of robust continuous and categorical sem estimation methods under suboptimal condition. Psychological Methods, 17, 354–373. doi:10.1037/a0029315
PubMed Web of Science ®Google Scholar
Rizopoulos, D., & Moustaki, I. (2008). Generalized latent variable models with non-linear effects. British Journal of Mathematical and Statistical Psychology, 61, 415–438. doi:10.1348/000711007X213963
PubMed Web of Science ®Google Scholar
Rizopoulos, D., Verbeke, G., & Lesaffre, E. (2009). Fully exponential Laplace approximations for the joint modelling of survival and longitudinal data. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 71, 637–654. doi:10.1111/j.1467-9868.2008.00704.x
Web of Science ®Google Scholar
Shun, Z., & McCullagh, P. (1995). Laplace approximation of high dimensional integrals. Journal of the Royal Statistical Society, Series B (Methodological), 57, 749–760. doi:10.1111/rssb.1995.57.issue-4
Web of Science ®Google Scholar
Song, X.-Y., & Lee, S.-Y. (2004). Bayesian analysis of two-level nonlinear structural equation models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology, 57, 29–52. doi:10.1348/000711004849259
PubMed Web of Science ®Google Scholar
Song, X.-Y., & Lee, S.-Y. (2005). Maximum likelihood analysis of nonlinear structural equation models with dichotomous variables. Multivariate Behavioral Research, 40, 151–177. doi:10.1207/s15327906mbr4002_1
PubMed Web of Science ®Google Scholar
Song, X.-Y., & Lee, S.-Y. (2006a). Bayesian analysis of structural equation models with nonlinear covariates and latent variables. Multivariate Behavioral Research, 41, 337–365. doi:10.1207/s15327906mbr4103_4
PubMed Web of Science ®Google Scholar
Song, X.-Y., & Lee, S.-Y. (2006b). A maximum likelihood approach for multisample nonlinear structural equation models with missing continuous and dichotomous data. Structural Equation Modeling, 13, 325–351. doi:10.1207/s15328007sem1303_1
Web of Science ®Google Scholar
Steele, B. M. (1996). A modified EM algorithm for estimation in generalized mixed models. Biometrics, 52, 1295–1310. doi:10.2307/2532845
PubMed Web of Science ®Google Scholar
Tierney, L., & Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81, 82–86. doi:10.1080/01621459.1986.10478240
Web of Science ®Google Scholar
Tierney, L., Kass, R. E., & Kadane, J. B. (1989). Fully exponential Laplace approximations to expectations and variances of nonpositive functions. Journal of the American Statistical Association, 84, 710–716. doi:10.1080/01621459.1989.10478824
Web of Science ®Google Scholar
Wall, M. M., & Amemiya, Y. (2000). Estimation for polynomial structural equation models. Journal of the American Statistical Association, 95, 929–940. doi:10.1080/01621459.2000.10474283
Web of Science ®Google Scholar
Wall, M. M., & Amemiya, Y. (2001). Generalized appended product indicator procedure for nonlinear structural equation analysis. Journal of Educational and Behavioral Statistics, 26, 1–29. doi:10.3102/10769986026001001
Google Scholar
Wei, G. C. G., & Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm. Journal of the American Statistician Association, 85, 699–704. doi:10.1080/01621459.1990.10474930
Web of Science ®Google Scholar
Yang-Jonsson, F. (1997). Non-linear structural equation models: Simulation studies of the Kenny-Judd model (Doctoral dissertation), Department of Statistics, Uppsala University, Uppsala, Sweden.
Google Scholar
Zhu, H.-T., & Lee, S.-Y. (1999). Statistical analysis of nonlinear factor analysis models. British Journal of Mathematical and Statistical Psychology, 52, 225–242. doi:10.1348/000711099159080
Web of Science ®Google Scholar

Appendix

Approximated Maximum Likelihood Estimator

In this section, the proposed maximum likelihood estimator is described in detail. For notational convenience, let the complete log-likelihood be:

h_{i} (Υ_{i}; θ) = log f (y_{i}, x_{i}, Υ_{i}; θ) .

Let $p_{x}$ and $p_{y}$ be the number of indicators associated with $ξ$ and $η$ , respectively. Under the distributional assumptions $ξ \sim N (0, Φ)$ , $ζ_{1} \sim N (0, Ψ_{11})$ and $ζ_{2} \sim N (0, Ψ_{22})$ ,

h_{i} (Υ_{i}; θ) = log f (y_{i} | η_{i}) + log f (x_{i} | ξ_{i}) + log f (η_{1, i} | η_{2, i}, ξ_{i}) + log f (η_{2, i} | ξ_{i}) + log f (ξ_{i}),

where

log f (y_{i} | η_{i}) = \sum_{j = 1}^{p_{y}} log P (Y_{i j} = v_{i j} | η_{i}),

log f (x_{i} | ξ_{i}) = \sum_{j = 1}^{p_{x}} log P (X_{i j} = u_{i j} | ξ_{i}),

log f (η_{1, i} | η_{2, i}, ξ_{i}) = - \frac{K_{η_{1}}}{2} log (2 π) - \frac{1}{2} log [det (Ψ_{11})] + log [det (I_{K_{η_{1}}} - B_{11})]

- \frac{1}{2} {((I_{K_{η_{1}}} - B_{11}) [η_{1, i} - E (η_{1, i} | η_{2, i}, ξ_{i})])}^{T} Ψ_{11}^{- 1} ((I_{K_{η_{1}}} - B_{11}) [η_{1, i} - E (η_{1, i} | η_{2, i}, ξ_{i})]),

log f (η_{2, i} | ξ_{i}) = - \frac{K_{η_{2}}}{2} log (2 π) - \frac{1}{2} log [det (Ψ_{22})] + log [det (I_{K_{η_{2}}} - B_{22})]

- \frac{1}{2} {((I_{K_{η_{2}}} - B_{22}) [η_{2, i} - E (η_{2, i} | ξ_{i})])}^{T} Ψ_{22}^{- 1} ((I_{K_{η_{2}}} - B_{22}) [η_{2, i} - E (η_{2, i} | ξ_{i})]),

log f (ξ_{i}) = - \frac{K_{ξ}}{2} log (2 π) - \frac{1}{2} log |Φ| - \frac{1}{2} ξ_{i}^{T} Φ^{- 1} ξ_{i},

with

E (η_{1, i} | η_{2, i}, ξ_{i}) = {(I_{K_{η_{1}}} - B_{11})}^{- 1} [B_{12} η_{2, i} + Γ_{1} ξ_{i} + (I_{K_{η_{1}}} \otimes ξ_{i}^{T}) Ω_{1} ξ_{i} + (I_{K_{η_{1}}} \otimes ξ_{i}^{T}) Π η_{2, i} + (I_{K_{η_{1}}} \otimes η_{2, i}^{T}) Ξ η_{2, i}],

and

E (η_{2, i} | ξ_{i}) = {(I_{K_{η_{2}}} - B_{22})}^{- 1} [τ_{2} + Γ_{2} ξ_{i} + (I_{K_{η_{2}}} \otimes ξ_{i}^{T}) Ω_{2} ξ_{i}] .

Further, $P (Y_{i j} = v_{i j} | η_{i})$ and $P (X_{i j} = u_{i j} | ξ_{i})$ depend on the link function. For example, with the probit link,

P (Y_{i j} = v_{i j} | η_{i}) = (\begin{matrix} Φ (α_{y, j}^{(1)} - β_{y, j}^{T} η_{i}), & v_{i j} = 1, \\ 1 - Φ (α_{y, j}^{(V_{j} - 1)} - β_{y, j}^{T} η_{i}), & v_{i j} = V_{j}, \\ Φ (α_{y, j}^{(v_{i j})} - β_{y, j}^{T} η_{i}) - Φ (α_{y, j}^{(v_{i j} - 1)} - β_{y, j}^{T} η_{i}), & o t h e r w i s e, \end{matrix}

P (X_{i j} = u_{i j} | ξ_{i}) = (\begin{matrix} Φ (α_{x, j}^{(1)} - β_{x, j}^{T} ξ_{i}), & u_{i j} = 1, \\ 1 - Φ (α_{x, j}^{(U_{j} - 1)} - β_{x, j}^{T} ξ_{i}), & u_{i j} = U_{j}, \\ Φ (α_{x, j}^{(u_{i j})} - β_{x, j}^{T} ξ_{i}) - Φ (α_{x, j}^{(u_{i j} - 1)} - β_{x, j}^{T} ξ_{i}), & o t h e r w i s e, \end{matrix}

where $V_{j}$ is the number of response categories for the ordinal indicator $X_{j}$ , $U_{j}$ is the number of response categories for the ordinal indicator $X_{j}$ , and $Φ (\cdot)$ is the cumulative distribution function of a standard normal random variable. Then, the observed log-likelihood to be approximated is:

ℓ (θ) = \sum_{i = 1}^{n} log \int f (y_{i}, x_{i}, Υ_{i}; θ) d Υ_{i} = \sum_{i = 1}^{n} log \int exp \{h_{i} (Υ_{i}; θ)\} d Υ_{i} .

If $η_{1}$ is a scalar that has only one binary indicator, we restrict $Ψ_{11} = 0$ and $β_{y, 11} = 1$ . The modified complete log-likelihood is then:

h_{i} (Υ_{i}; θ) = log f (y_{i} | η_{i}) + log f (x_{i} | ξ_{i}) + log f (η_{2, i} | ξ_{i}) + log f (ξ_{i}),

where

η_{1, i} = {(I_{K_{η_{1}}} - B_{11})}^{- 1} [B_{12} η_{2, i} + Γ_{1} ξ_{i} + (I_{K_{η_{1}}} \otimes ξ_{i}^{T}) Ω_{1} ξ_{i} + (I_{K_{η_{1}}} \otimes ξ_{i}^{T}) Π η_{2, i} + (I_{K_{η_{1}}} \otimes η_{2, i}^{T}) Ξ η_{2, i}] .

Accordingly, the vector of latent variables to be integrated out is $Υ_{i} = (η_{2, i}^{T}, ξ_{i}^{T})^{T}$ .

Laplace Approximation

In general, the first-order Laplace approximation (Barndorff-Nielsen & Cox, Citation1989) to the integral $\int exp \{g (t)\} d t$ is:

(2 π)^{K / 2} {[det (- \frac{\partial^{2} g (\hat{t})}{\partial t \partial t^{T}})]}^{- 1 / 2} exp \{g (\hat{t})\},

where $t$ is a $K \times 1$ vector and $g (t)$ is a unimodal function with the mode $\hat{t}$ . The approximation is essentially achieved by Taylor expanding $g (t)$ about its mode. Shun and McCullagh (Citation1995) further derived the second-order Laplace approximation. The Laplace approximation of the observed log-likelihood is:

(7)

ℓ_{(L a p)} (Υ; θ) = \sum_{i = 1}^{n} [\frac{K}{2} log (2 π) + h_{i} (Υ_{i}; θ) - \frac{1}{2} log det (- H_{i} (Υ_{i}; θ)) + log (1 + R_{2} (Υ_{i}; θ))],

(7)

where $K = K_{ξ} + K_{η}$ or $K = K_{ξ} + K_{η_{2}}$ ,

H_{i} (Υ_{i}; θ) = \frac{\partial^{2} h_{i} (Υ_{i}; θ)}{\partial Υ_{i} \partial Υ_{i}^{T}},

is the Hessian matrix, and $Υ_{i}$ is evaluated at ${\hat{Υ}}_{i}$ , the solution of

\frac{\partial h_{i} (Υ_{i}; θ)}{\partial Υ_{i}} = 0,

with respect to $Υ_{i}$ for a fixed $θ$ . In the first-order approximation, $R_{2} (Υ_{i}; θ) = 0$ , for any $i$ . In the second-order approximation,

R_{2} (Υ_{i}; θ) = - \frac{1}{2} [\frac{1}{4} \sum_{j, l, m, r = 1}^{K} (- \frac{\partial^{4} h_{i} (Υ_{i}; θ)}{\partial Υ_{j} \partial Υ_{l} \partial Υ_{m} \partial Υ_{r}} c_{j m} c_{l r})

- \sum_{j, l, m, r, s, t = 1}^{K} \frac{\partial^{3} h_{i} (Υ_{i}; θ)}{\partial Υ_{j} \partial Υ_{l} \partial Υ_{m}} \frac{\partial^{3} h_{i} (Υ_{i}; θ)}{\partial Υ_{r} \partial Υ_{s} \partial Υ_{t}} (\frac{1}{4} c_{j l} c_{m r} c_{s t} + \frac{1}{6} c_{j r} c_{l s} c_{m t})],

where $c_{j, m}$ is the $(j, m)^{t h}$ element of the inverse of $- H_{i} (Υ_{i}; θ)$ and the approximation is second-order accurate (Shun & McCullagh, Citation1995). The second-order Laplace approximation has been recommended by Lee and Nelder (Citation2001) for generalized linear-mixed models and by Jin, Noh, and Lee (Citation2018) for confirmatory factor analysis models with ordinal data.

Adaptive Gauss–Hermite Quadrature Approximation

The Gauss–Hermite-Hermite quadrature rule approximates the integral $\int exp \{g (t)\} d t$ by a weighted sum of function values evaluated at pre-specified quadrature points as:

\sum_{k_{1} = 1}^{q} \dots \sum_{k_{k} = 1}^{q} ω_{k_{1} k_{2} \dots k_{k}}^{*} exp \{g (z_{k_{1}}, \dots, z_{k_{k}})\},

where

w_{k_{1} k_{2} \dots k_{K}}^{*} = \prod_{j = 1}^{K} w_{k_{j}} exp \{z_{k_{j}}^{2}\},

with $z_{k_{j}}$ being the Gauss–Hermite quadrature points, $w_{k_{j}}$ being the corresponding weights, and $q$ being the number of quadrature points per latent variable. Liu and Pierce (Citation1994) proposed to translate and dilate the quadrature points in the weighted sum, known as the AGHQ approximation, to improve the approximation effectiveness when $g (t)$ is an unimodel function. The AGHQ approximated observed log-likelihood is:

(8)

\!\!\!\!\!\!\!\!\!\!{\ell _{(AGHQ)}}(\hat \Upsilon ;\theta) = \sum\limits_{i = 1}^n \left({{K \over 2}\log \left(2 \right) - {1 \over 2}\log \det \left({ - {H_i}\left({{{\hat \Upsilon }_i};\theta } \right)} \right)} \right.\left. { + \log \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \sum\limits_{{k_1} = 1}^q \cdots \sum\limits_{{k_K} = 1}^q \left\{ {w_{{k_1}{k_2}...{k_K}}^*\exp \left[{{h_i}\left({{{\widetilde \Upsilon }_i}\left({{{\bf{z}}_{{k_1}{k_2} \cdots {k_K}}}} \right);\theta } \right)} \right]} \right\}} \right\big),

(8)

where

{\tilde{Υ}}_{i} (z_{k_{1} k_{2} \dots k_{K}}) = \sqrt{2} L_{i} z_{k_{1} k_{2} \dots k_{K}} + {\hat{Υ}}_{i}

is the latent variable vector dilated and translated around the mode ${\hat{Υ}}_{i}$ with $L_{i}$ being the Cholesky decomposition of the inverse of the negative Hessian:

L_{i} L_{i}^{T} = {(- H_{i} (Υ_{i}; θ))}^{- 1},

and $z_{k_{1} k_{2} \dots k_{K}}$ being the vector containing the $k_{1}^{t h}$ , $k_{2}^{t h}$ ,…, and the $k_{K}^{t h}$ quadrature points. If only one quadrature point per latent variable is used, $ℓ_{(A G H Q)} ({\hat{Υ}}_{i}; θ)$ is the same as the first-order approximated $ℓ_{(L a p)} ({\hat{Υ}}_{i}; θ)$ . Let ${\hat{θ}}_{(A G H Q)}$ be the estimator of $θ$ that maximizes (8). Following Bianconcini (Citation2014), Jin and Andersson (Citation2019b) showed that the error rate of the approximation is

O ({(p_{x} + p_{y})}^{- ⌊(q + 2) / 3⌋}),

and that ${\hat{θ}}_{(A G H Q)}$ satisfies

{\hat{θ}}_{(A G H Q)} - θ_{0} = O_{p} [max (n^{- 1 / 2}, {(p_{x} + p_{y})}^{- ⌊(q + 2) / 3⌋})],

where $θ_{0}$ is the true parameter vector and $⌊\cdot⌋$ takes the largest integer that is less than the enclosed value. Hence, using $5$ quadrature points per latent variable yields the same error rate as the second-order Laplace approximation.

Direct Maximum Likelihood

When maximizing functions such as (7) or (8) it is important to realize that the mode ${\hat{Υ}}_{i}$ depends on the parameter vector $θ$ . The gradient of the approximated observed log-likelihood (regardless of approximation method) is:

(9)

\frac{\partial ℓ (\hat{Υ} (θ); θ)}{\partial θ} = \frac{\partial ℓ (θ)}{\partial θ} |_{Υ = \hat{Υ}} + {(\frac{\partial \hat{Υ} (θ)}{\partial θ^{T}})}^{T} \frac{\partial ℓ (Υ)}{\partial Υ} |_{Υ = \hat{Υ}},

(9)

where $\partial ℓ (θ) / \partial θ$ is the derivative of the observed log-likelihood function with respect to $θ$ , treating the mode as constant. On the contrary, $\partial ℓ (Υ) / \partial Υ$ is the derivative of the observed log-likelihood function with respect to $Υ$ , treating $θ$ as constant. Hence, the second term acknowledges that the mode is affected by $θ$ , by applying the chain rule. By the implicit function theorem,

\frac{\partial {\hat{Υ}}_{i} (θ)}{\partial θ^{T}} = - {(\frac{\partial^{2} h_{i} (Υ_{i}; θ)}{\partial Υ_{i} \partial Υ_{i}^{T}} |_{Υ_{i} = {\hat{Υ}}_{i}})}^{- 1} \frac{\partial^{2} h_{i} (Υ_{i}; θ)}{\partial Υ_{i} \partial θ^{T}} |_{Υ_{i} = {\hat{Υ}}_{i}} .

The sandwich estimator of standard errors can be calculated from:

(10)

{(\hat{\frac{\partial^{2} ℓ (θ)}{\partial θ \partial θ^{T}}})}^{- 1} \sum_{i = 1}^{n} [\frac{\partial ℓ_{i} (\hat{θ})}{\partial θ} \frac{\partial ℓ_{i} (\hat{θ})}{\partial θ^{T}}] {(\hat{\frac{\partial^{2} ℓ (θ)}{\partial θ \partial θ^{T}}})}^{- 1},

(10)

where $ℓ_{i}$ is the $i^{t h}$ term in EquationEquation (7)(7) $ℓ_{(L a p)} (Υ; θ) = \sum_{i = 1}^{n} [\frac{K}{2} log (2 π) + h_{i} (Υ_{i}; θ) - \frac{1}{2} log det (- H_{i} (Υ_{i}; θ)) + log (1 + R_{2} (Υ_{i}; θ))],$ (7) or (8). In the BFGS algorithm, the inverse of the Hessian matrix of $ℓ (\hat{Υ} (θ); θ)$ is approximated and updated by low-rank approximations at every iteration. The Hessian matrix can also be approximated from the Louis (Citation1982) method. Since the exact gradient of $ℓ (\hat{Υ} (θ); θ)$ is calculated by EquationEquation (9)(9) $\frac{\partial ℓ (\hat{Υ} (θ); θ)}{\partial θ} = \frac{\partial ℓ (θ)}{\partial θ} |_{Υ = \hat{Υ}} + {(\frac{\partial \hat{Υ} (θ)}{\partial θ^{T}})}^{T} \frac{\partial ℓ (Υ)}{\partial Υ} |_{Υ = \hat{Υ}},$ (9) , the sandwich estimator (10) of standard errors can be readily computed. In contrast, neither the gradient nor the Hessian matrix is the by-product in the EM algorithm and need to be computed afterward.

A Marginal Maximum Likelihood Approach for Extended Quadratic Structural Equation Modeling with Ordinal Data

ABSTRACT

Introduction

Non-linear structural equation model

Marginal maximum likelihood estimation

Approximated observed log-likelihood

Maximizing the approximated observed log-likelihood

Simulation study

Simulation design

Simulation results

Table 1. Percentage of inadmissible cases excluded from the analysis

Empirical example

Table 4. Point estimates and 95% confidence intervals for structural parameters in the empirical example corresponding to non-linear effects for the AGHQ approximation with 5-quadrature points, denoted by AGHQ(5p), and second-order Laplace approximation, denoted by Lap(2nd)

Discussion and conclusion

Acknowledgment

References

Appendix

Approximated Maximum Likelihood Estimator

Information for

Open access

Opportunities

Help and information

A Marginal Maximum Likelihood Approach for Extended Quadratic Structural Equation Modeling with Ordinal Data

ABSTRACT

Introduction

Non-linear structural equation model

Marginal maximum likelihood estimation

Approximated observed log-likelihood

Maximizing the approximated observed log-likelihood

Simulation study

Simulation design

Simulation results

Table 1. Percentage of inadmissible cases excluded from the analysis

Table 2. Relative bias and root mean squared error of the estimators of structural parameters in EquationEquations (3)(4) η2=τ2+B22η2+Γ2ξ+IKη2⊗ξTΩ2ξ+ζ2,(4) and (Equation4(5) ∏i=1nf(yi,xi,ηi,ξi)=∏i=1nf(yi|ηi)f(xi|ξi)f(η1,i|η2,i,ξi)f(η2,i|ξi)f(ξi).(5) )

Empirical example

Table 4. Point estimates and 95% confidence intervals for structural parameters in the empirical example corresponding to non-linear effects for the AGHQ approximation with 5-quadrature points, denoted by AGHQ(5p), and second-order Laplace approximation, denoted by Lap(2nd)

Discussion and conclusion

Acknowledgment

Additional information

Funding

References

Appendix

Approximated Maximum Likelihood Estimator

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date