659
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

A Faster Procedure for Estimating SEMs Applying Minimum Distance Estimators With a Fixed Weight Matrix

&
Pages 214-221 | Received 23 Jan 2022, Accepted 07 May 2022, Published online: 10 Oct 2022

Abstract

This study presents a separable nonlinear least squares (SNLLS) implementation of the minimum distance (MD) estimator employing a fixed-weight matrix for estimating structural equation models (SEMs). In contrast to the standard implementation of the MD estimator, in which the complete set of parameters is estimated using nonlinear optimization, the SNLLS implementation allows a subset of parameters to be estimated using (linear) least squares (LS). The SNLLS implementation possesses a number of benefits, such as faster convergence, better performance in ill-conditioned estimation problems, and fewer required starting values. The present work demonstrates that SNLLS, when applied to SEM estimation problems, significantly reduces the estimation time. Reduced estimation time makes SNLLS particularly useful in applications involving some form of resampling, such as simulation and bootstrapping.

1. Introduction

This study addresses the application of separable nonlinear least squares (SNLLS) when performing covariance structure analysis (CSA). SNLLS was first introduced by Golub and Pereyra (Citation1973), who showed that for a certain type of nonlinear estimation problems, a subset of parameters can be estimated using numerically efficient least squares (LS). As will be discussed below, several studies have shown that parameter separation offers a number of numerical benefits, such as faster convergence, better performance when the estimation problem is ill-conditioned (i.e., problems in which the ratio between the largest and the smallest singular value of the covariance matrix is large), and fewer required starting values.

SNLLS is typically applied to problems involving some form of nonlinear regression analysis, but not exclusively so. A recent study by Kreiberg et al. (Citation2021) suggested an SNLLS implementation of the minimum distance (MD) estimator for estimating confirmatory factor analysis (CFA) models. The motivation for the current study is to generalize the results in Kreiberg et al. (Citation2021) by outlining an SNLLS implementation for estimating structural equation models (SEMs). This is important for several reasons. First, it makes SNLLS applicable to a wider range of models. Second, at this stage, little is known about the potential benefits of applying SNLLS in the context of CSA. The outlined SNLLS implementation may pave the way for future research on how to improve the numerical performance of CSA based estimators.

To make the idea of SNLLS clearer, consider the familiar MD quadratic form objective function (1) F(ϑ)=(sxσx(ϑ))TV(sxσx(ϑ)),(1) where sx and σx(ϑ) are covariance vectors derived from the sample and the model, respectively, ϑ is the parameter vector and V is a weighting matrix chosen by the user. We consider the case in which V is a fixed matrix (i.e., when V is not a function of ϑ). Such cases include well-known estimators such as unweighted least squares (ULS), generalized least squares (GLS), and weighted least squares (WLS). The standard implementation of EquationEquation (1) is a one-step estimation procedure, here referred to as nonlinear least squares (NLLS), that involves the use of nonlinear optimization techniques. Estimation is performed by searching the parameter space for the value of ϑ that minimizes EquationEquation (1). In contrast, the SNLLS implementation of EquationEquation (1) is a two-step estimation procedure that works by splitting ϑ into two subsets. In the first step, one subset of parameters is estimated using nonlinear optimization. In the second step, based on the estimates obtained in the first step, the remaining subset of parameters is estimated using LS. As demonstrated in Kreiberg et al. (Citation2021), SNLLS provides parameter estimates and a minimum objective function value identical to those obtained using NLLS. It obviously follows that the asymptotic properties of the estimator are maintained. The presentation below presents a general framework for how to accomplish parameter separation in the case of SEMs.

Over the years, SNLLS has become popular in applied research across a wide range of scientific disciplines. Golub and Pereyra (Citation2003) compiled a list of real-world examples of SNLLS applications. Mullen (Citation2008) subsequently provided a comprehensive overview of SNLLS for a number of applications in physics and chemistry. SNLLS has also proved useful in systems and control applications. For instance, Söderström et al. (Citation2009), Söderström & Mossberg (Citation2011), and Kreiberg et al. (Citation2016) applied CSA to handle the errors-in-variables (EIV) estimation problem. The work in these studies showed how to implement the MD estimator using SNLLS.

Several studies have documented that the SNLLS implementation of nonlinear estimators offers a number of benefits. For instance, Sjöberg and Viberg (Citation1997) evaluated the numerical performance of SNLLS when applied to neural-network minimization problems. Their main conclusions were that SNLLS provides faster convergence and performs better in cases in which the estimation problem is ill-conditioned. A recent study by Dattner et al. (Citation2020) investigated the performance of SNLLS when applied to estimation problems involving ordinary differential equations (ODEs). Their simulations showed that SNLLS provides faster convergence as well as parameter estimates of similar or higher accuracy than what is achieved by traditional nonlinear procedures.

The remainder of this article is organized as follows. Section 2 establishes the notation used throughout the article. In this section, we provide a brief overview of the SEM framework and the associated MD estimator. Section 3 outlines how to modify the MD objective function to accommodate the SNLLS implementation of the estimator when applied to SEMs. Section 4 compares the numerical efficiency of SNLLS and NLLS when applied to real-world estimation problems. Finally, Section 5 presents some concluding remarks.

2. Background

2.1. Notation

Before presenting the SEM framework, it will be useful to introduce the following notation. Let x be a p×1 zero-mean random vector, and let Σx be the associated p×p covariance matrix given by (2) Σx=E[xxT],(2) where E is the expectation operator and the superscript T is the transpose of a vector or a matrix. The number of nonredundant elements in Σx is h=21p(p+1), given that no restrictions other than symmetry are placed on the elements of Σx. A covariance vector containing the nonredundant elements (i.e., the lower half of Σx including the diagonal) is (3) σx=vech(Σx).(3) In this expression, vech is the operation of vectorizing the nonredundant elements of Σx. Alternatively, σx is obtained by (4) σx=KxTvec(Σx).(4) Here, vec is the operation of vectorizing the elements of a matrix by stacking its columns, and Kx is a p2×h matrix obtained from (5) Kx=Lx(LxTLx)1,(5) where Lx is a p2×h selection matrix containing only ones and zeros. This matrix has the additional usage (6) vec(Σx)=Lxσx.(6) In the case of symmetry, Lx is referred to as the duplication matrix in the literature (see Magnus & Neudecker, Citation1999). The matrices Lx and Kx can be formed to handle covariance matrices with additional structure beyond symmetry. For instance, in the case that Σx is a diagonal, Kx is constructed so that σx contains only the elements on the diagonal of Σx. Appendix A outlines a general framework for how to obtain Lx and Kx for various structures characterizing Σx.

We now expand the previous notation. Let x1 and x2 be p1×1 and p2×1 zero-mean random vectors, respectively. A p=p1+p2 dimensional column vector is given by (7) x=(x1T x2T)T.(7) The associated p×p covariance matrix is (8) Σx=(Σx1(p1×p1)Σx2,x1T(p1×p2)Σx2,x1(p2×p1)Σx2(p2×p2)),(8) where (9) Σx1=E[x1x1T], Σx2=E[x2x2T], Σx2,x1=E[x2x1T].(9)

As before, the vector consisting of the nonredundant elements of Σx is given by σx. However, for later, it will be more convenient to work with the vector (10) σx=(σx1T σx2T σx2,x1T)T,(10) where (11) σx1=Kx1Tvec(Σx1), σx2=Kx2Tvec(Σx2), σx2,x1=vec(Σx2,x1).(11)

Note that σx contains the same elements as σx, but in a different order. The last equation in EquationEquation (11) follows from the fact that there is no redundancy in Σx2,x1. Appendix A shows how to derive a matrix Lx. Then, by using EquationEquations (4) and Equation(5), we obtain the covariance vector (12) σx=KxTvec(Σx).(12)

2.2. The SEM Framework

With the basic notation in place, we are ready to introduce the SEM framework, which consists of the following three equations (excluding constant terms) (13) η=Bη+Γξ+δ,(13) (14) x1=Λ1η+ϵ1,(14) (15) x2=Λ2ξ+ϵ2.(15) The first equation is the structural equation, which specifies the causal relationships among the latent variables. In this equation, η and ξ are respectively pη×1 and pξ×1 random vectors, δ is a pη×1 random noise vector, and B and Γ are respectively pη×pη and pη×pξ parameter matrices relating the latent random vectors. The last two equations are measurement equations. In these equations, x1 and x2 are respectively p1×1 and p2×1 observed random vectors, ϵ1 and ϵ2 are noise vectors of similar dimensions, and Λ1 and Λ2 are respectively p1×pη and p2×pξ parameter matrices relating the observed and the latent random vectors. All random vectors are zero-mean.

It is assumed that IB, where I is the identity matrix, is nonsingular such that η is uniquely determined by ξ and δ. It is further assumed that δ and ξ are mutually uncorrelated, and that ϵ1 and ϵ2 are mutually uncorrelated with η and ξ, respectively. The noise vectors ϵ1 and ϵ2 are allowed to correlate.

The specification additionally includes the following covariance matrices (16) Σξ=E[ξξT], Σδ=E[δδT], Σϵ1=E[ϵ1ϵ1T],Σϵ2=E[ϵ2ϵ2T], Σϵ2,ϵ1=E[ϵ2ϵ1T]. (16) The nonredundant elements of Σξ, Σδ, Σϵ1, Σϵ2 and Σϵ2,ϵ1 are given by the covariance vectors (17) σξ=KξTvec(Σξ), σδ=KδTvec(Σδ), σϵ1=Kϵ1Tvec(Σϵ1),σϵ2=Kϵ2Tvec(Σϵ2), σϵ2,ϵ1=vec(Σϵ2,ϵ1).(17) Let ϑ be a parameter vector containing the free elements in B, Γ, Λ1, Λ2, Σξ, Σδ, Σϵ1, Σϵ2 and Σϵ2,ϵ1, and let H=(IB)1. The covariance matrix implied by EquationEquations (13)–(15) is (18) Σx(ϑ)=(Λ1H(ΓΣξΓT+Σδ)HTΛ1T+Σϵ1Λ1HΓΣξΛ2T+Σϵ2,ϵ1TΛ2ΣξΓTHTΛ1T+Σϵ2,ϵ1Λ2ΣξΛ2T+Σϵ2).(18)

2.3. The MD Estimator

Suppose that a sample of data points xi (for i=1,,N) is available. An estimate of Σx is then computed using (19) Sx=1Ni=1NxixiT.(19) Given Sx, the aim is to estimate the true parameter vector ϑ0. An estimate of ϑ0 is obtained by (20) ϑ̂=arg minF(ϑ),ϑ(20) where F(ϑ) is a scalar function that expresses the distance between the observed and the model-implied covariance structure. Below, we focus on the MD objective function given by (21) F(ϑ)=(sxσx(ϑ))TV(sxσx(ϑ)).(21) In this expression, sx and σx(ϑ) are vectors containing the nonredundant elements of Sx and Σx(ϑ), respectively. That is, (22) sx=KxTvec(Sx),      σx(ϑ)=KxTvec(Σx(ϑ)).(22) Moreover, the matrix V is a positive definite weighting matrix. Under suitable conditions, and for the right choice of V, the MD estimator is consistent and asymptotically normal. Note that consistency does not depend on V as long as V converges in probability to a symmetric positive definite matrix.

Using a proper algorithm, EquationEquation (21) is minimized by numerically searching the parameter space until some convergence criterion is satisfied. For the estimation problem to be feasible, it is a necessary condition that the number of elements in sxσx(ϑ) is at least as large as the number of free parameters in ϑ.

3. Modifying the MD Quadratic Form Objective Function

Next, we outline how to modify the objective function in EquationEquation (21) to accommodate the SNLLS implementation. To do so, we need some additional notation. Let ϑβ,γ,λ be a tϑβ,γ,λ×1 vector containing the free elements in B, Γ, Λ1, and Λ2, and let σξ,δ,ϵ be a tσξ,δ,ϵ×1 vector containing the free elements in Σξ, Σδ, Σϵ1, Σϵ2, and Σϵ2,ϵ1. The vector σξ,δ,ϵ is formed by (23) σξ,δ,ϵ=(σξT σδT σϵ1T σϵ2T σϵ2,ϵ1T)T.(23) The complete parameter vector now becomes (24) ϑ=(ϑβ,γ,λT σξ,δ,ϵT)T.(24) The key to applying SNLLS is the separation of parameters, which involves expressing the covariance vector in EquationEquation (10) using (25) σx=G(ϑβ,γ,λ)σξ,δ,ϵ.(25) In this expression, G(ϑβ,γ,λ) is a tall matrix valued function (i.e., a matrix consisting of more rows than columns) assumed to have full column rank. In Appendix B, it is shown that G(ϑβ,γ,λ) takes the general form (26) G(ϑβ,γ,λ)=(Kx1T(Λ1HΓΛ1HΓ)LξKx1T(Λ1HΛ1H)LδKx1TLϵ100Kx2T(Λ2Λ2)Lξ00Kx2TLϵ20(Λ1HΓΛ2)Lξ000I),(26) where is the Kronecker product, the 0s are zero matrices of compatible sizes, and I is the identity matrix. It is now possible to write the objective function using (27) F(ϑβ,γ,λ, σξ,δ,ϵ)=(sxG(ϑβ,γ,λ)σξ,δ,ϵ)TV(sxG(ϑβ,γ,λ)σξ,δ,ϵ),(27) where sx and V correspond to sx and V, respectively, but with their rows and columns rearranged according to the order in σx. For some value of G(ϑβ,γ,λ), the solution to the problem of minimizing EquationEquation (27) w.r.t. σξ,δ,ϵ is a straightforward application of LS (28) σ̂ξ,δ,ϵ(ϑβ,γ,λ)=(GT(ϑβ,γ,λ)VG(ϑβ,γ,λ))1GT(ϑβ,γ,λ)Vsx.(28) Since σ̂ξ,δ,ϵ depends on ϑβ,γ,λ, it is necessary to outline how to obtain an estimate ϑ̂β,γ,λ without directly involving σξ,δ,ϵ. Theorem 2.1 in Golub and Pereyra (1973) provides the justification for replacing σξ,δ,ϵ in EquationEquation (27) with the right-hand side of EquationEquation (28). Doing so, leads to the modified objective function (29) F(ϑβ,γ,λ)=sxTVsxsxTVG(ϑβ,γ,λ)(GT(ϑβ,γ,λ)VG(ϑβ,γ,λ))1GT(ϑβ,γ,λ)Vsx.(29) Apart from some slight notational differences, the derivation of EquationEquation (29) is similar to the derivation in Kreiberg et al. (Citation2021). From the preceding presentation, it follows that SNLLS is a two-step procedure. In the first step, ϑ̂β,γ,λ is obtained by minimizing EquationEquation (29) applying nonlinear optimization. In the second step, using ϑ̂β,γ,λ from the first step, σ̂ξ,δ,ϵ is obtained by EquationEquation (28).

The major benefit of the formulation in EquationEquation (29) is that the minimization w.r.t. ϑβ,γ,λ represents a lower dimensional optimization problem. Thus, the computational load when minimizing F(ϑβ,γ,λ) w.r.t. ϑβ,γ,λ is smaller, and in some cases by a considerable margin, than what is the case when minimizing EquationEquation (21) w.r.t. ϑ. This is especially the case when the number of elements in σξ,δ,ϵ is large compared with the number of elements in ϑβ,γ,λ.

4. Illustrations

This section provides two examples that illustrate the difference in numerical efficiency between the two implementations, SNLLS and NLLS, of the MD estimator when applied to SEMs. Numerical performance is assessed by studying the convergence of the optimizer and the time it takes the optimizer to reach its minimum. Since timing depends on other processes running on the device performing the estimation, it is recommended to compute the average estimation time over multiple runs. Estimation and timing are performed using Matlab (Citation2020, version R2020b). The two implementations are compared under the following conditions:

  • Algorithm: The optimizer is a Quasi-Newton (QN) design applying the Broyden–Fletcher–Goldfarb–Shanno (BFGS) Hessian update mechanism (default in Matlab).

  • Gradient: For simplicity, the gradient is computed using a finite difference approach. The computation is based on a centered design, which is supposed to provide greater accuracy at the expense of being more time-consuming.

  • Tolerances: Tolerances are set to their default values (details are found in the Matlab documentation).

  • Starting values: Starting values are taken from the open-source R (R Core Team, Citation2021) package lavaan (Rosseel, Citation2012). The starting values for the free elements are as follows:

    • – Λ1 and Λ2 are computed using the non-iterative fabin 3 estimator (see Hägglund, Citation1982).

    • – B and Γ are set to zero.

    • – Σξ and Σδ are set to zero except for the diagonal elements, which are set to 0.05.

    • – Σϵ1 and Σϵ2 are set to zero except for the diagonal elements, which are set to half the observed variance. For the examples below, no starting values are required for the elements in Σϵ21

Note that SNLLS only requires starting values for the parameter vector ϑβ,γ,λ, whereas NLLS requires starting values for the complete parameter vector ϑ.
  • Estimator: The GLS estimator is used throughout the examples. The GLS estimator uses a weight matrix of the form (30) V=21LxT(S1S1)Lx.(30)

  • Timing: In each example, the model is re-estimated 1000 times using the same empirical covariance matrix as input.

To ensure that our programming is correct, we compared the estimation results to the results obtained using lavaan.

4.1. Example 1

The first example considers a model for the medical illness of depression. The data (N=323) used in this example are taken from Geiser (Citation2012) and consist of six indicators of depression. In the data, X1,1 and X1,2 are indicators of the first-order common factor Depression State 1, X1,3 and X1,4 are indicators of the first-order common factor Depression State 2, and X1,5 and X1,6 are indicators of the first-order common factor Depression State 3. The three factors themselves are indicators of the second-order common trait factor Depression. The model additionally contains an indicator-specific factor labeled IS. Indicators X1,1, X1,2, X1,3, X1,5, and the factor Depression State 1 serve as marker variables. The path diagram illustrating the structure of the model is shown in .

Figure 1. Geiser (Citation2012).

Figure 1. Geiser (Citation2012).

Results of the estimation are presented in . As seen from the table, the number of iterations and function evaluations is (It, Fe) = (23, 375) for SNLLS and (It, Fe) = (145, 5439) for NLLS. As expected, the required computational load for minimizing F(ϑβ,γ,λ) w.r.t. ϑβ,γ,λ is far less than the required load for minimizing F(ϑ) w.r.t. ϑ. shows the convergence profiles for the two implementations. From the figure, it is clear that the SNLLS objective function F(ϑβ,γ,λ) starts at a point much closer to its minimum of 0.0109 than what is seen for the NLLS objective function F(ϑ). In terms of estimation time, the mean time is 0.0334 sec. for SNLLS and 0.1408 sec. for NLLS. Thus, SNLLS is faster by a factor of 0.1408/0.0334 = 4.2153. The results in this example clearly suggest that the SNLLS implementation is numerically more efficient than the standard NLLS implementation.

Figure 2. Convergence profile, Geiser (Citation2012).

Figure 2. Convergence profile, Geiser (Citation2012).

Table 1. Timing results, Geiser (Citation2012).

4.2. Example 2

The second example considers a model for industrialization and political democracy. The model is taken from Bollen (Citation1989), and has been used extensively in books, tutorials, etc. The data consist of 11 indicators of industrialization and political democracy for 75 countries (N=75). In the data, X1,1,,X1,4 are indicators of the common factor Political Democracy at time 1 (1960), X1,5,,X1,8 are indicators of the common factor Political Democracy at time 2 (1965) and X2,1,,X2,3 are indicators of the common factor Industrialization at time 1 (1960). Due to the repeated measurement design, the unique factors belonging to X1,i and X1,i+4 for i=1,,4 are set to correlate. Additionally, the unique factors belonging to X1,i and X1,i+2 for i=2, 6 are set to correlate. Indicators X1,1, X1,5 and X2,1 serve as marker variables. The path diagram of the model is shown in .

Figure 3. Bollen (Citation1989).

Figure 3. Bollen (Citation1989).

Results of the estimation are presented in . The results in this example generally confirm the results from the previous example. In this case, the number of iterations and function evaluations are (It, Fe) = (26, 759) for SNLLS and (It, Fe) = (230, 14742) for NLLS. shows the convergence profiles for the two implementations. The patterns in the figure resemble those in . Considering the estimation time, the mean time is 0.1257 sec. for SNLLS and 0.8263 sec. for NLLS. In this case, SNLLS proves to be faster by a factor of 0.8263/0.1257 = 6.5736.

Figure 4. Convergence profile, Bollen (Citation1989).

Figure 4. Convergence profile, Bollen (Citation1989).

Table 2. Timing results, Bollen (Citation1989).

5. Concluding Remarks

In this study, we have presented an SNLLS implementation of the MD objective function for estimating SEMs. The outlined framework includes all necessary expressions for applying SNLLS, and represents a generalization of previously known results. Using examples from the SEM literature, we demonstrated that the computational load of applying SNLLS is considerably less than that of applying NLLS. Another benefit of SNLLS is that fewer starting values are required, which may mitigate potential problems due to the somewhat arbitrary choice of starting values for the covariance parameters.

The present work may have several interesting extensions. First, as shown by research, SNLLS may hold a potential for improving numerical performance in situations in which the estimation problem is ill-conditioned. Thus, an interesting case for future research would be to compare the numerical performance of SNLLS and NLLS under more challenging conditions in which the condition number of the observed covariance matrix is large. Second, the SNLLS implementation is not yet available for maximum likelihood (ML) estimation. Some initial work on this topic is underway. This work, combined with the previous point, may lead to an improved implementation of the ML estimator when applied to SEMs.

References

  • Bollen, K. A. (1989). Structural equations with latent variables. Wiley.
  • Dattner, I., Ship, H., & Voit, E. O. (2020). Separable nonlinear least-squares parameter estimation for complex dynamic systems. Complexity, 2020, 1–11. https://doi.org/10.1155/2020/6403641
  • Geiser, C. (2012). Data analysis with Mplus. Guilford Press.
  • Golub, G. H., & Pereyra, V. (1973). The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate. SIAM Journal on Numerical Analysis, 10, 413–432. https://doi.org/10.1137/0710036
  • Golub, G. H., & Pereyra, V. (2003). Separable nonlinear least squares: The variable projection method and its applications. Inverse Problems, 19, R1–R26. https://doi.org/10.1088/0266-5611/19/2/201
  • Hägglund, G. (1982). Factor analysis by instrumental variables methods. Psychometrika, 47, 209–222. https://doi.org/10.1007/BF02296276
  • Kreiberg, D., Marcoulides, K., & Olsson, U. H. (2021). A faster procedure for estimating CFA models applying minimum distance estimators with a fixed weight matrix. Structural Equation Modeling: A Multidisciplinary Journal, 28, 725–739. https://doi.org/10.1080/10705511.2020.1835484
  • Kreiberg, D., Söderström, T., & Yang-Wallentin, F. (2016). Errors-in-variables system identification using structural equation modeling. Automatica, 66, 218–230. https://doi.org/10.1016/j.automatica.2015.12.007
  • Magnus, J. R., & Neudecker, H. (1999). Matrix differential calculus with applications in statistics and econometrics. John Wiley & Sons.
  • MATLAB version R2020b. (2020). The MathWorks Inc.
  • Mullen, K. M. (2008). Separable nonlinear models: Theory, implementation and applications in physics and chemistry [Ph.D. thesis]. Vrije Universiteit Amsterdam.
  • R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing. http://www.R-project.org/
  • Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. https://doi.org/10.18637/jss.v048.i02
  • Sjöberg, J., & Viberg, M. (1997). Separable non-linear least-squares minimization-possible improvements for neural net fitting. In Neural networks for signal processing VII. Proceedings of the 1997 IEEE signal processing society workshop (pp. 345–354). IEEE.
  • Söderström, T., & Mossberg, M. (2011). Accuracy analysis of a covariance matching approach for identifying errors-in-variables systems. Automatica, 47, 272–282. https://doi.org/10.1016/j.automatica.2010.10.046
  • Söderström, T., Mossberg, M., & Hong, M. (2009). A covariance matching approach for identifying errors-in-variables systems. Automatica, 45, 2018–2031. https://doi.org/10.1016/j.automatica.2009.05.010

Appendices

A. Deriving L and K  

Let x be an arbitrary p×1 random vector, and let Σx={σi,j} be the associated p×p covariance matrix. The purpose of the following presentation is to introduce an algebraic framework that facilitates eliminating the redundancy originating from the structure of Σx. To do so, let Kx be a matrix such that (A1) σx=KxTvec(Σx),(A1) where σx is a covariance vector containing the nonredundant elements of Σx and Kx is a matrix obtained by (A2) Kx=Lx(LxTLx)1.(A2) In this expression, Lx is a selection matrix (i.e., a matrix composed of zeros and ones).

Below, we propose a rather general framework that applies to any structure characterizing Σx. Before presenting some examples on how to obtain Lx, it is necessary to introduce some additional notation. Let E(u,v)={ei,j(u,v)} denote a p×p matrix (for i,j=1,,p) with elements (A3) ei,j(u,v)={ 1if σi,j=σu,v 0otherwise .(A3) Next, we demonstrate how to obtain Lx for two standard cases and one case specialized for the SNLLS implementation.

Case 1:

As a start, consider the case in which Σx is symmetric and no other restrictions are placed on its elements. The covariance vector containing the 21p(p+1) nonredundant elements of Σx is (A4) σx=(σ1,1  σp,1 σ2,2  σp,2 σ3,3   σp,p)T.(A4) Applying (A3), the matrix Lx is formed by horizontally concatenating 21p(p+1) vectors using (A5) Lx=(vec(E(1,1))  vec(E(p,1)) vec(E(2,2))  vec(E(p,2))vec(E(3,3))   vec(E(p,p))).(A5)

Case 2:

Now, consider the case in which Σx is diagonal. The covariance vector containing the p nonredundant elements of Σx is given by (A6) σx=(σ1,1 σ2,2  σp,p)T.(A6) The matrix Lx is now formed by horizontally concatenating p vectors (A7) Lx=vec(E(1,1)) vec(E(2,2))  vec(E(p,p))).(A7) Before introducing the third and final case, it is necessary to expand the notation. Let x1 and x2 be respectively p1×1 and p2×1 random vectors, and let x be a p=p1+p2 dimensional column vector obtained by stacking x1 and x2 in the following way (A8) x=(x1T x2T)T.(A8) The associated p × p covariance matrix is given by (A9) Σx=(Σx1(p1×p1)Σx2,x1T(p1×p2)Σx2,x1(p2×p1)Σx2(p2×p2)).(A9)

Case 3:

As in the first case, suppose that no other restrictions, apart from symmetry, are placed on the elements of Σx. Let the covariance vector containing the 21p(p+1) nonredundant elements of Σx be given by (A10) σx=(σx1T σx2T σx2,x1T)T,(A10) where (A11) σx1=(σ1,1  σp1,1 σ2,2  σp1,2 σ3,3   σp1,p1)T,(A11) (A12) σx2=(σp1+1,p1+1  σp,p1+1 σp1+2,p1+2  σp,p1+2 σp1+3,p1+3   σp,p)T,(A12) (A13) σx2,x1=(σp1+1,1  σp,1 σp1+1,2  σp,2 σp1+1,3   σp,p1)T.(A13)

Construct a matrix Lx by horizontally concatenating three matrices (A14) Lx=((Lx)1,1 (Lx)1,2 (Lx)1,3),(A14) where the submatrices are given by (A15) (Lx)1,1=(vec(E(1,1))  vec(E(p1,1)) vec(E(2,2))  vec(E(p1,2)) vec(E(3,3))   vec(E(p1,p1))), (A15) (A16) (Lx)1,2=(vec(E(p1+1,p1+1))  vec(E(p,p1+1))vec(E(p1+2,p1+2))  vec(E(p,p1+2)) vec(E(p1+3,p1+3))   vec(E(p,p))), (A16) (A17) (Lx)1,3=(vec(E(p1+1,1))  vec(E(p,1)) vec(E(p1+1,2))  vec(E(p,2))vec(E(p1+1,3))   vec(E(p,p1))).  (A17) The number of columns in (EquationA15), (EquationA16), and (EquationA17) is 21p1(p1+1), 21p2(p2+1) and p2×p1, respectively.

B. Deriving G(ϑβ, γ, λ)

The derivation below uses the following matrix identity (B1) vec(ABC)=(CTA)vec(B),(B1) where A, B, and C are matrices of compatible sizes. In addition, we make use of the following relations (B2) vec(Σξ)=Lξσξ, vec(Σδ)=Lδσδ, vec(Σϵ1)=Lϵ1σϵ1,vec(Σϵ2)=Lϵ2σϵ2. (B2) The model-implied covariance matrix is (B3) Σx(ϑ)=(Σx1(ϑ)Σx2,x1T(ϑ)Σx2,x1(ϑ)Σx2(ϑ))=(Λ1H(ΓΣξΓT+Σδ)HTΛ1T+Σϵ1Λ1HΓΣξΛ2T+Σϵ2,ϵ1TΛ2ΣξΓTHTΛ1T+Σϵ2,ϵ1Λ2ΣξΛ2T+Σϵ2)=(Λ1HΓΣξΓTHTΛ1T+Λ1HΣδHTΛ1T+Σϵ1Λ1HΓΣξΛ2T+Σϵ2,ϵ1TΛ2ΣξΓTHTΛ1T+Σϵ2,ϵ1Λ2ΣξΛ2T+Σϵ2).(B3) Applying SNLLS, the key is to express the model-implied covariance vector using the form (B4) σx(ϑ)=(σx1T(ϑ) σx2T(ϑ) σx2,x1T(ϑ))T:=G(ϑβ,γ,λ)σξ,δ,ϵ.(B4) To do so, it is necessary to vectorize the individual blocks of (B3). Starting with the block Σx1(ϑ), we have (B5) σx1(ϑ)=Kx1Tvec(Σx1(ϑ))=Kx1Tvec(Λ1HΓΣξΓTHTΛ1T)+Kx1Tvec(Λ1HΣδHTΛ1T)+ Kx1Tvec(Σϵ1)=Kx1T(Λ1HΓΛ1HΓ)vec(Σξ)+Kx1T(Λ1HΛ1H)vec(Σδ)+Kx1Tvec(Σϵ1).(B5) Using (EquationB1) and (EquationB2), it follows that (B6) σx1(ϑ)=Kx1T(Λ1HΓΛ1HΓ)Lξσξ+Kx1T(Λ1HΛ1H)Lδσδ+Kx1TLϵ1σϵ1=(Kx1T(Λ1HΓΛ1HΓ)Lξ Kx1T(Λ1HΛ1H)Lδ Kx1TLϵ1 0 0)σξ,δ,ϵ.(B6) Next, we consider the block Σx2(ϑ). Using the same procedure as before, we have (B7) σx2(ϑ)=Kx2Tvec(Σx2(ϑ))=Kx2Tvec(Λ2ΣξΛ2T)+Kx2Tvec(Σϵ2)=Kx2T(Λ2Λ2)vec(Σξ)+Kx2Tvec(Σϵ2)=Kx2T(Λ2Λ2)Lξσξ+Kx2TLϵ2σϵ2=(Kx2T(Λ2Λ2)Lξ 0 0 Kx2TLϵ2 0)σξ,δ,ϵ.(B7) Finally, for the block Σx2,x1(ϑ), it follows that (B8) σx2,x1(ϑ)=vec(Σx2,x1(ϑ))=vec(Λ2ΣξΓTHTΛ1T)+vec(Σϵ2,ϵ1)=(Λ1HΓΛ2)vec(Σξ)+vec(Σϵ2,ϵ1)=(Λ1HΓΛ2)Lξσξ+σϵ2,ϵ1=((Λ1HΓΛ2)Lξ 0 0 0 I)σξ,δ,ϵ.(B8) Putting the pieces together, we obtain (B9) (σx1(ϑ)σx2(ϑ)σx2,x1(ϑ))=(Kx1T(Λ1HΓΛ1HΓ)LξKx1T(Λ1HΛ1H)LδKx1TLϵ100Kx2T(Λ2Λ2)Lξ00Kx2TLϵ20(Λ1HΓΛ2)Lξ000I) ×σξ,δ,ϵ.(B9)