999
Views
0
CrossRef citations to date
0
Altmetric
Article

Moment conditions for the quadratic regression model with measurement error

, &

Abstract

We consider a new estimator for the quadratic errors-in-variables model that exploits higher-order moment conditions under the assumption that the distribution of the measurement error is symmetric and free of excess kurtosis. Our approach contributes to the literature by not requiring any side information and by straightforwardly allowing for one or more error-free control variables. We propose a Wald-type statistical test, based on an auxiliary method-of-moments estimator, to verify a necessary condition for our estimator’s consistency. We derive the asymptotic properties of the estimator and the statistical test and illustrate their finite-sample properties by means of a simulation study and an empirical application to existing data from the literature. Our simulations show that the method-of-moments estimator performs well in terms of bias and variance and even exhibits a certain degree of robustness to the distributional assumptions about the measurement error. In the simulation experiments where such robustness is not present, our statistical test already has high power for relatively small samples.

JEL CLASSIFICATIONS:

1. Introduction

The quadratic regression model is widely relevant in economics and business research. A classical example is the Kuznets curve, which reflects the inverted-U shaped impact of economic development on income inequality (Kuznets, Citation1955). A version that has recently become popular is the environmental Kuznets curve, with environmental quality taking the place of income inequality; see, for example, Lee and Oh (Citation2015). Quadratic regression models have also been used to capture the relation between firms’ input factor costs and output quantities, GDP growth and democracy, crime and inequality and patents and competition (e.g., Aghion et al., Citation2005; Barro, Citation1996; Martínez-Budría et al., Citation2003; Zhu and Li, Citation2017). In yet another area, Haans et al. (Citation2016) found that one out of nine papers published in the Strategic Management Journal from 2008 to 2012 involved quadratic relations. The quadratic errors-in-variables model has become particularly popular for the study of Engel curves, which describe the relation between household expenditure and household income (e.g., Biørn, Citation2017; Hausman et al., Citation1995; Kedir and Girma, Citation2007; Lewbel, Citation1997).

Griliches and Ringstad (Citation1970) were the first to underline the importance of correcting for measurement error in quadratic regression models. They showed that the effect of measurement error is exacerbated by the quadratic term in a regression model with a normally distributed unobserved regressor and normal measurement error. Ever since, an increasingly large literature on the consistent estimation of the non-linear measurement error model has emerged.

Many estimation methods for the quadratic and polynomial measurement-error model assume that the variance of the measurement error is known or, alternatively, that the reliability or the signal-to-noise ratio is known (e.g., Carroll et al., Citation2006; Kuha and Temple, Citation2007; Kukush et al., Citation2005; Schneeweiss and Augustin, Citation2006).Footnote1 These estimators have limited relevance in economics, where such prior information is typically unavailable.

Estimators for the quadratic and polynomial measurement-error model that do not make such assumptions are more scarce; see the upper part of . The earliest study we know of is Van Montfort (Citation1989), who uses the method of moments. He exploits moments up to order three to obtain consistent estimators for the quadratic measurement-error model with a normally distributed unobserved regressor. Lewbel (Citation1997, p. 1206) briefly mentions the possibility to construct a method-of-moments estimator for the quadratic regression model with normal measurement errors. The proposed estimator is based on moments up to order five. Huang and Huwang (Citation2001) derive consistent estimators for the polynomial measurement-error model without additional identifying information. They use a regression-calibration approach and impose normality on both the measurement error and the unobserved regressor. Other methods require either replicated measurements on the error-ridden regressor or instrumental variables (Biørn, Citation2017; Hausman et al., Citation1991, Citation1995; Kedir and Girma, Citation2007; Lewbel, Citation1996; Li, Citation2002).

Table 1. Overview of the literature.

Another strand of literature considers a general, non-linear parametric regression function depending on an unknown parameter vector and proposes methods to consistently estimate this vector in the presence of measurement error; see the lower part of . Some of these studies require replicated measurements (e.g., Garcia and Ma, Citation2017; Hausman et al., Citation1988; Li, Citation2002). Others use external instrumental variables (Hu and Schennach, Citation2008) or take certain covariates as instruments (Ben-Moshe et al., Citation2017). Tsiatis and Ma (Citation2004) assume that the distribution of the measurement error is known or that replicated measurements are available such that the unknown parameters of this distribution can be estimated. The semi-parametric estimator of Schennach and Hu (Citation2013) does not require such assumptions and is consistent under general conditions. The empirical implementation of this approach is based on sieve densities. Another very general, but highly computer-intensive approach has been proposed by Schennach (Citation2014). Overviews of the literature on non-linear measurement-error models can be found in Chen et al. (Citation2011) and Schennach (Citation2016).

The present study proposes a new consistent estimator for the quadratic errors-in-variables model, which exploits moments up to order four. Our estimator takes an intermediate position relative to the existing literature. We assume a symmetric measurement-error distribution without excess kurtosis, for which normality is a sufficient but not a necessary condition. Under these assumptions, we obtain a relatively efficient estimator. Unlike several other studies, we do not require any side information, such as a known measurement error variance, replicated measurements, or instrumental variables. Furthermore, our approach straightforwardly allows for one or more error-free control variables, which only requires the standard assumption that these regressors are independent of the measurement and regression errors. For other methods, such as the one proposed by Schennach and Hu (Citation2013), the inclusion of error-free regressors requires certain assumptions about the conditional distribution of the unobserved regressor given the error-free control variables.

We also propose a Wald-type statistical test, based on an auxiliary method-of-moments estimator, to verify a necessary condition for the consistency of our method-of-moments estimator. We derive the asymptotic properties of our method-of-moments estimator and the statistical test. We illustrate their finite-sample properties in several of Monte Carlo simulations and in an empirical application to existing data from the literature. In the simulation study, we compare the method-of-moments estimator to the inconsistent OLS estimator and the consistent sieve-based estimator of Schennach and Hu (Citation2013). Because OLS and the sieve-based approach represent two ends of the spectrum, we use them as a benchmark.

Our simulation study shows that the method-of-moments estimator performs well in terms of bias and variance and even exhibits a certain degree of robustness to deviations from the assumption that the measurement error has a symmetric distribution without excess kurtosis. In the simulation experiments where such robustness is not present, our statistical test already has high power for relatively small samples. The method-of-moments estimator generally outperforms the OLS estimator in terms of attenuation bias and also performs well in comparison to the semi-parametric estimator of Schennach and Hu (Citation2013) in the normal and symmetric case. The latter estimator is consistent under fairly general conditions, but its optimal performance turns out difficult to achieve in practice. The main problem is the use of interior-point optimization for the constrained optimization of the log-likelihood function. We experiment with different starting values for the optimization and observe that it matters quite a lot, which is a well-known problem in the literature (Gertz et al., Citation2004). Our simulations also illustrate the drawback of the sieve-based method’s assumptions about the conditional distribution of the unobserved regressor given the error-free control variables.

Under the assumption of Schennach and Hu (Citation2013) that “measurement error is not sufficiently severe to completely alter the shape of the specification,” we recommend considering our method-of-moments estimator as a potential candidate if OLS reveals a quadratic relation. On the basis of our theoretical analysis and simulation study, we recommend our estimator as the final choice if the Wald test fails to reject. We also discuss the possibility of combining our initial method-of-moment estimator with the auxiliary estimator (used for the Wald test) by means of model averaging.

In an empirical application, we use the well-known Boston data set (Harrison and Rubinfeld, Citation1978) and study the impact of a neighborhood’s socio-economic status on the housing prices in that area. Because our statistical test does not reject, we base our subsequent inference on the method-of-moments estimator that assumes a symmetric measurement-error distribution free without excess kurtosis. We establish significant measurement error, resulting in a reliability of around 80%. However, we are faced with a counter-intuitive sign of one of the control variables’ coefficient estimates, which remains present after winsorization of the data. This could be an indication that the standard quadratic location-shift regression model is too restrictive and that we need quantile regression to account for the effect that certain housing characteristic are priced differently for houses in the upper-price range as compared to houses in the lower-price range (Zietz et al., Citation2008). Alternatively, it could indicate a source of endogeneity, caused by simultaneity or omitted variables. This would require an approach that can deal with both measurement error and additional sources of endogeneity (e.g., Hu et al., Citation2015, Citation2016; Song et al., Citation2015).

Our approach directly extends the strand of literature initiated by Geary (Citation1942), who introduced the moment-based approach for the linear measurement-error model and whose approach was elaborated on by many others (Cragg, Citation1997; Dagenais and Dagenais, Citation1997; Erickson and Whited, Citation2000, Citation2012; Kendall and Stuart, Citation1973; Meijer et al., Citation2017; Pal, Citation1980; Scott, Citation1950; Van Montfort et al., Citation1989).

The setup of the remainder of this paper is as follows. Section 2 analyzes the effects of ignoring measurement error in the quadratic measurement-error model by deriving the attenuation bias of the OLS estimator. The outline of our approach is sketched Section 3, followed by the details of our method-of-moments estimator that assumes a symmetric measurement-error distribution without excess kurtosis (referred to as “MM1”). Section 4 proposes a Wald test based on an auxiliary method-of-moments estimator (“MM2”) to test a necessary condition for the consistency of MM1. The sieve-based approach of Schennach and Hu (Citation2013) acts as our benchmark approach together with OLS and is described in Section 5. The results of a simulation study and an empirical application are discussed in Sections 6 and 7, respectively. Finally, Section 8 provides discussion and conclusions. An online appendix with supplementary material is available.

2. Attenuation bias of OLS

This section focuses on the largely ignored insights in the OLS estimator’s attenuation bias offered by the quadratic errors-in-variables model where both the measurement error and the unobserved regressor are normally distributed. This analysis extends Griliches and Ringstad (Citation1970), Van Montfort (Citation1989) and Wansbeek and Meijer (Citation2000).

For a generic observation, hence omitting subscripts labeling observations, we write the quadratic regression model with measurement error as (1) y=α+βξ+γξ2+ε;x=ξ+v,(1) where x is observed, ξ is unobserved, v is the measurement error and ε the regression error. We adopt the standard assumptions that ξ, ε and v are mutually independent and that both the regression error ε and the measurement error v have mean zero and variances σε2 and σv2, respectively. For the sake of analytical tractability in the calculations that follow, we assume vN(0,σv2) and ξN(μ1,σξ2). We denote μkE(ξk).

We start with the measurement error bias of the OLS estimators of α, β and γ. Because of the normality of ξ, we have Cov (ξ,ξ2)=E(ξ3)E(ξ)E(ξ2)=μ3μ1μ2=2μ1σξ2Var (ξ2)=E(ξ4)E(ξ2)2=μ4μ22=2σξ4+4μ12σξ2.

Now let (2) Aξ(Var (ξ)Cov (ξ,ξ2)Cov (ξ,ξ2)Var (ξ2))=σξ2(12μ12μ12σξ2+4μ12).(2)

Hence, with σx2σξ2+σv2, the normality of v implies (3) Ax(Var (x)Cov (x,x2)Cov (x,x2)Var (x2))=σx2(12μ12μ12σx2+4μ12).(3)

Let, with reliability ρσξ2/σx2, (4) B(12(1ρ)μ10ρ).(4)

Then Aξ=ρAxB and (5) plimn(β̂γ̂)=Ax1Aξ(βγ)=ρB(βγ)=(ρβ+2ρ(1ρ)γμ1ρ2γ),(5) where n denotes the sample size. This result was derived by Griliches and Ringstad (Citation1970) only for the special case where μ1=0. The value of ξ where E(y|ξ) has its minimum (γ>0) or maximum (γ<0) is the turning point τβ/(2γ), for which we have (6) τ*plimnτ̂=plimnβ̂2γ̂=τ(1ρ)μ1ρ .(6)

Because τ=ρτ*+(1ρ)μ1, we observe that τ is overestimated when τ>μ1 and underestimated when τ<μ1. Note that τ* and μ1 can be estimated consistently, so τ is consistently bounded. Let πρ(1ρ)γσx2=(1ρ)γσξ2 and note that μ2=ρσx2+μ12. Then for the OLS estimator α̂ of α (7) plimnα̂=E(y)(plimnβ̂)E(x)(plimnγ̂)E(x2)=[α+βμ1+γ(ρσx2+μ12)][ρβ+2ρ(1ρ)γμ1]μ1ρ2γ(σx2+μ12)=α+γ(ρ2τ*2τ2)+π.(7)

Let ymaxα+βτ+γτ2=αγτ2 be the minimum (γ>0) or maximum (γ<0) value of E(y|ξ). With measurement error, its estimated counterpart converges to ymax*=α+γ(ρ2τ*2τ2)+πρ2γτ*2=ymax+(1ρ)γσξ2. The results are depicted in . The attenuation effect, well-known from the linear errors-in-variables model, shows up in the quadratic model in two forms. First, the graph has less curvature. Second, the minimal value is higher (lower if γ<0). Another effect is that the value of ξ where the minimum is attained is pushed away from its true value, but this can be in either direction depending on the position of τ relative to μ1. These attenuation effects emphasize the importance of controlling for measurement error in the quadratic errors-in-variables model.

Figure 1. The effect of measurement error on the OLS-estimated curve.

Notes: The solid curve indicates the true relation, while the dotted curve reflects the estimated relation.

Figure 1. The effect of measurement error on the OLS-estimated curve.Notes: The solid curve indicates the true relation, while the dotted curve reflects the estimated relation.

3. Method-of-moments estimation

This section focuses on the quadratic regression model with measurement error given by (1). We derive our method-of-moments estimator, discuss its identification and provide an extension to additional error-free control variables. Throughout, we maintain the assumptions that ξ, ε and v are mutually independent and that both the regression error ε and the measurement error v have mean zero and variances σε2 and σv2, respectively. The only distributional assumption that we make is that v is symmetrically distributed and free of excess kurtosis, for which normality is a sufficient but not a necessary condition. Hence, in contrast to Section 2, we no longer impose any distributional assumptions on ξ.

3.1. Global outline of the approach

Our approach is to harvest enough moment conditions for consistent estimation. There are two first moments of y and x, three second moments, four third moments. If we use moments up to order k, their number adds up to 2+3+4++(k+1)=k(k+3)/2. The expectation of the first k moments of y and x involves the parameters μj, j=1,,2k. For v and ε, the number of moments is k − 1 each, since E(v)=E(ε)=0. There are three other parameters, namely α, β and γ. Hence, without assuming normality of any of the random terms, there are 4k+1 parameters in total and a necessary condition for identification is k(k+3)/24k+1 or k25k20. Hence, if we do not impose any further structure on the distribution of v, we need moments of at least order six. Estimators using such higher-order moments are expected to be sensitive to outliers, because the impact of extreme values on sample means is amplified by raising these large values to a high power. Under symmetry and zero excess kurtosis, the moments of v are fully determined by σv2. As a result, the parameters of the quadratic measurement-error model are identifiable from the first four moments of y and x.

The price we pay for the assumptions we impose on v is the risk of misspecification. Later we will therefore develop a statistical test to verify a necessary condition for the consistency of our method-of-moments estimator. This test is based on an auxiliary method-of-moments estimator that is consistent under symmetric measurement error, which requires moments up to order five.

3.2. Method-of-moments estimation

We formulate the following set of assumptions:

Assumptions 3.1.

  1. We observe y and x, which come from the quadratic measurement-error model in (1).

  2. ξ, ε and v are mutually independent with E(ε)=E(v)=0,E(ε2)=σε2 and E(v2)=σv2.

  3. v is symmetric. More specifically, (a) E(v3)=0 and (b) E(v5)=0.

  4. v is free of excess kurtosis; i.e., κv=E(v4)/σv4=3.

We note that the assumption of mutual independence of ξ,v and ε is in line with, e.g., Schennach and Hu (Citation2013). We impose this assumption to ensure that the expectations of certain products of random variables reduce to the products of the expectations. The same effect could be achieved by imposing less stringent covariance assumptions of the form Cov (x1k,x2)=0, for appropriate values of k and and with x1,x2{v,ε,ξ},x1=x2.

Under Assumptions 3.1(i) – (iii), we find (8) E(x)=μ1(8) (9) E(x2)=μ2+σv2(9) (10) E(x3)=3μ1σv2+μ3(10) (11) E(x4)=E(ξ4+6ξ2v2+v4)=6μ2σv2+μ4+κvσv4(11) (12) E(x5)=E(ξ5+10ξ3v2+5ξv4)=μ5+10μ3σv2+5μ1κvσv4,(12) where κvE(v4)/σv4 denotes the kurtosis of v. Moment conditions (8) – (12) use E(v)=0, while (10) and (11) also use E(v3)=0 (i.e., Assumption 3.1 (iii-a)). Moment condition (12) additionally uses E(v5)=0 (i.e., Assumption 3.1 (iii-b)).

With πv(6κv)σv4, we can rewrite (13) (μ1μ2μ3μ4μ5)=E(m1m2m3m4m5)E(xx2σv2x33σv2xx46σv2x2+(6κv)σv4x510σv2x3+5(6κv)σv4x)=E(xx2σv2x33σv2xx46σv2x2+πvx510σv2x3+5πvx).(13)

If we now also impose Assumption 3.1 (iv), we get πv=3σv4. Then m4 and m5 in (13) reduce to m4=x46σv2x2+3σv4 and m5=x510σv2x3+15σv4x. The parameters of interest are α, β, γ, σε2,σv2, while the μks are the nuisance parameters.

To estimate these parameters, we consider moment conditions involving moments up to order four, of which there are 2+3+4+5=14. We discard E(xy3) and E(y4), because they involve μ7 and μ8. We also ignore E(y3) and E(x2y2), because they depend on μ6. Theoretically, dropping moments and parameters may entail a slight loss of efficiency in estimating the other parameters. We nevertheless believe that this effect will be small relative to the advantage of not using unstable higher-order moments.

We thus consider the moments E(y),E(xy),E(x2y),E(x3y),E(y2) and E(xy2). Elimination of α by centering the variables is not straightforward in a quadratic model, so we keep the intercept and refrain from centering. We equivalently consider the moments of y¯yα instead of y and of the mks instead of the powers of x. We write m1 for x for the sake of transparency. The moment conditions linear in y¯ that we exploit are (14) E(y¯)=βμ1+γμ2(14) (15) E(m1y¯)=E[(ξ+v)(βξ+γξ2)]=βμ2+γμ3(15) (16) E(m2y¯)=E[(ξ2+v2σv2)(βξ+γξ2)]=βμ3+γμ4(16) (17) E(m3y¯)=E[(ξ3+3ξv23σv2ξ)(βξ+γξ2)]=βμ4+γμ5.(17)

After eliminating the μks, this yields (18) E(mjy¯βmj+1γmj+2)=0,(18) for j=0,1,2,3 and with m01. The moment conditions quadratic in y¯ that we consider are E(y¯2)=E[(βξ+γξ2+ε)2]=β2μ2+2βγμ3+γ2μ4+σε2=E[(βm1+γm2)y¯]+σε2E(m1y¯2)=E[ξ(βξ+γξ2+ε)2]=β2μ3+2βγμ4+γ2μ5+μ1σε2=E[(βm2+γm3)y¯]+μ1σε2.

After eliminating the μs, we find (19) E(y¯2(βm1+γm2)y¯σε2)=0(19) (20) E(m1y¯2(βm2+γm3)y¯σε2m1)=0.(20)

Because (18) with j = 3 involves μ5, we drop this condition. We then collect (18) [j = 0, 1, 2], (19) and (20) and write the system of moment equations as E[h1(θ;d)]=0, where d(x,y),θ(α,β,γ,σε2,σv2) and (21) h1(θ;d)(y¯βm1γm2m1y¯βm2γm3m2y¯βm3γm4y¯2(βm1+γm2)y¯σε2m1y¯2(βm2+γm3)y¯σε2m1).(21)

The method-of-moments estimator solves (22) 1ni=1nh1(θ;di)=0.(22)

The resulting estimator θ̂ will henceforth be referred to as “MM1” and its components are denoted by α̂,β̂,γ̂, …. It uses Assumptions 3.1 (i), (ii), (iii-a) and (iv).

Alternatively, we can relax the assumption of no excess kurtosis and only assume symmetry of v. If we drop Assumption 3.1 (iv), the expectations of m4 and m5 in (13) contain πv=(6κv)σv4. We therefore have to estimate the extended parameter vector η(α,β,γ,σε2,πv,σv2). We note that we estimate πv instead of the kurtosis κv, because the underlying parameter transformation turned out to make it easier to find a numerical solution to the system of moment conditions.

Because of the additional parameter πv, we add (18) with j = 3 to the moment conditions we already used for MM1. We collect (18) [j=0,1,2,3], (19) and (20) and write the system of moment conditions as E[h2(η;d)]=0. Our second method-of-moments estimator η˜, referred to as “MM2”, solves (23) 1ni=1nh2(η;di)=0.(23)

This estimator uses Assumptions 3.1 (i), (ii), (iii-a) and (iii-b). The components of η˜ will be denoted by α˜,β˜,γ˜, ….

3.3. Asymptotic covariance matrix and identification

The method of moments is known to yield consistent and asymptotically normal estimators. To obtain the asymptotic covariance matrix corresponding to MM1, we need the Jacobian of the moment conditions with respect to the parameters as a function of the observed data. To obtain this matrix, we note that m2/σv2=1,m3/σv2=3m1,m4/σv2=6m2 and y¯/α=1. The Jacobian writes as G1(θ;d)=(1m1m20γm1m2m30(β+3γm1)m2m3m40y¯3βm16γm22y¯βm1γm2y¯m1y¯m21γy¯2y¯m1βm2γm3y¯m2y¯m3m1(β+3γm1)y¯).

With observed data d1,,dn, this yields (24) Var̂(θ̂)=1n(1ni=1nG1(θ̂;di))1(1ni=1nh1(θ̂;di)h1(θ̂;di))(1ni=1nG1(θ̂;di))1.(24)

The model parameters are (locally) identified if the expectation of the Jacobian has full rank. This yields our main identification result.

Result 3.1. Under Assumptions 3.1 (i), (ii), (iii-a) and (iv), E(G1(θ;d)) corresponding to MM1 fails to have full rank if γ = 0 or if μ1=μ2=μ3=β=0. In the latter case, ξ = 0 with probability 1. This is a trivial case that we assume not applicable. As to the former:

  1. If γ = 0, then γ is always identified.

  2. If γ = 0 and β = 0, then all parameters except σv2 are identified.

  3. If γ = 0 and the skewness of ξ is 0, then only γ is identified.

  4. In all other cases, all parameters are identified.

The proof of this result is in (online) Appendix A, supplementary material where we derive an explicit expression for the expectation of the Jacobian.

In finite samples and under misspecification, it is an empirical matter whether (22) has a unique solution θ̂ that satisfies the feasibility conditions σ̂ε20 and 0σ̂v2σ̂x2, with σ̂x2 the sample variance of x. We will come back to the existence, uniqueness and feasibility of the solution in our simulation study in Section 6.

Similarly, we find that, under Assumptions 3.1 (i), (ii), (iii-a) and (iii-b), E(G2(η;d)) corresponding to MM2 fails to have full rank if either μ1=μ3=β=0 or γ = 0. This is shown in Appendix A, supplementary material, where we derive an explicit expression for the expectation of the Jacobian. Again the existence, uniqueness and feasibility of a solution of (23) is an empirical matter in finite samples and under misspecification. Feasibility means that σ˜ε20,0σ˜v2σ̂x2 and π˜v6σ˜v4, where the latter restriction follows from the non-negativity of the kurtosis. We will come back to this issue in our simulation study.

3.4. Error-free control variables

With an additional vector of error-free control variables zRK, (1) becomes (25) y=α+βξ+γξ2+zλ+ε.(25)

For this extended model, consider the additional assumption that (ξ,z) is independent of both v and ε. We will refer to this as Assumption 3.1 (v).Footnote2 This assumption yields the moment conditions (26) E[z(y¯βm1γm2)]=E[z(εβvγ(2ξv+v2σv2))]=0.(26)

The inclusion of additional error-free regressors is straightforward: redefine y¯=yαzλ in the moment conditions, add (26) to the moment conditions of either MM1 or MM2 and solve the resulting system of moment equations.

4. Statistical test

This section proposes a statistical test to validate a necessary assumption for the consistency of MM1 and discusses its statistical properties.

4.1. Diagnostic testing in the errors-in-variables model

With only observed covariates, the econometric literature provides an extensive array of tools for diagnostic and goodness-of-fit testing of regression models. For example, if we used specific-to-general model selection, we would typically first estimate an unrestricted model and perform some diagnostic and goodness-of-fit tests. Depending on the outcomes of these tests, we would subsequently revise the model by strengthening certain assumptions and by estimating an adjusted, more parsimonious regression model using a more efficient estimator. We would iteratively repeat these steps until the diagnostic tests indicated that the model assumptions cannot be strengthened any further, given the data under consideration.

In the presence of an error-ridden variable, however, such an approach is usually not possible. A major issue is that we observe neither the unobserved regressor nor the measurement error, making it impossible to apply tests to them. Simply ignoring the presence of measurement error is not an option either, since conventional statistical tests typically do not have the usual asymptotic properties in the presence of measurement error. Tailor-made diagnostic testing and variable selection for the errors-in-variables model is still in an early stage (Blalock, Citation1965; Bloch, Citation1978; Carrillo-Gamboa and Gunst, Citation1992; Huang et al., Citation2005; Huang and Zhang, Citation2013; Nghiem and Potgieter, Citation2019; Zhao et al., Citation2020).

An additional complication is that changing a single assumption of the errors-in-variables model already requires substantial changes in the underlying estimation method to maintain consistency. This form of ill-conditionedness of the errors-in-variables model explains why many estimators for this model rely on the standard assumption that the unobserved regressor, measurement error and regression error are mutually independent. We follow this convention by maintaining the usual independence assumptions, but we propose a statistical test to verify a necessary condition for the consistency of MM1.

4.2. Test statistic

Let πv,2=plimnπ˜v and σv,24=plimn(σ˜v2)2. The restriction that we will test is πv,2=3σv,24. This property holds if v is symmetric and free of excess kurtosis, since MM2 is consistent in this case. We therefore use MM2 to construct a Wald test for testing the null hypothesis H0:πv=3σv4 against the alternative hypothesis H1:πv=3σv4. This yields the test statistic (27) qW={π˜v3(σ˜v2)2}2(RVar̂(η˜)R)1,(27) where R=(0,0,0,0,1,6σ˜v2). We reject the null hypothesis at the u% significance level if qW>χ1,1u2; otherwise do not reject.

To investigate the asymptotic size and power of the Wald test, we first discuss a few special cases. Under the null hypothesis that v has a symmetric distribution without excess kurtosis, MM2 is consistent such that πv,2=3σv,24. As a result, qW is asymptotically χ12 distributed, yielding an asymptotic rejection rate (size) of u. For symmetric alternatives with πv=3σv4, MM2 is still consistent. Consequently, we must have πv,2=3σv,24, yielding an asymptotic rejection rate (power) of 1. Because both MM1 and MM2 assume a symmetric measurement-error distribution, we cannot construct a test for the symmetry assumption on the basis of these two estimators. For asymmetric alternatives, nevertheless, our Wald test will have an asymptotic rejection rate of 1 as long as πv,2=3σv,24 (Cameron and Trivedi, Citation2005, Ch. 7). Hence, as long as the inconsistency of MM2 causes πv,2 to be different from 3σv,24, the asymptotic power of the Wald test will be 1 for asymmetric alternatives.

We will investigate the finite-sample behavior of MM2 and the Wald test by means of a simulation study in Section 6, where we will consider both symmetric and asymmetric alternatives.

5. Benchmark approach

Before discussing the results of a simulation study, we explain the approach of Schennach and Hu (Citation2013). This approach will be used as a benchmark approach in our simulation study, together with OLS.

5.1. Sieve-based estimation

The semi-parametric estimator of Schennach and Hu (Citation2013) applies to general non-linear models of the form y=g(ξ,τ)+ε, with g(·,·) a parametric function of the unobserved regressor and a finite-dimensional parameter vector τ. The joint density of the observables (y, x) is denoted by fyx. This joint density depends on the marginal densities of the regression error (f1), the measurement error (f2) and the unobserved regressor (f3) via the following integral equation: (28) fyx(y,x)=f1(yg(ξ,τ))f2(xξ)f3(ξ)dξ.(28)

Schennach and Hu (Citation2013) provide the conditions under which this equation is non-parametrically identified and thus yields a unique functional solution (τ,f1,f2,f3).

Schennach and Hu (Citation2013) propose a sieve-based approach using maximum-likelihood estimation. Thanks to the use of sieve densities, their approach does not require distributional assumptions such as symmetry of the measurement error. The method involves maximum likelihood estimation subject to non-linear parameter constraints. Applied to our quadratic specification, the log-likelihood function is given by (29) Li=1nlogf1*(yiαβξγξ2)f2*(xiξ)f3*(ξ)dξ.(29)

The densities f1*,f2* and f3* are chosen to be sieve densities of the form (30) fk*(z)=(j=0skδjkpj(z))2[k=1,2,3],(30) for unknown coefficients δ0k,,δskk and sieve smoothing parameters sk. The functions pj(z) are orthonormal Hermite polynomials, with pj(z)=(1/πj!2j)Hj(z)exp(z2/2),H0(z)=1,H1(z)=2z and Hj+1(z)=2zHj(z)2jHj1(z). Parameter constraints must be imposed to ensure that each of the three sieve densities integrate to unity and that the first two have mean zero: (31) j=0sk(δjk)2=1[k=1,2,3];j=0sk12(j+1)δjk δj+1k=0[k=1,2].(31)

Because of these parameter restrictions, we must have sk2 for k = 1, 2 and s31 to ensure that the resulting sieve densities have at least one free parameter left after the parameter conditions have been imposed.

Because σε2,σv2,σξ2 and μ1 are all functions of the δs, the parameters in the parameter vector τ=(δ01,,δs11,δ02,,δs22,δ03,,δs33,α,β,γ) are estimated jointly.

Schennach and Hu (Citation2013) show that the sieve-based approach is n-consistent for sk as n, k = 1, 2, 3. In practice, the values of the sieve smoothing parameters s1, s2 and s3 will have to be chosen in a data-driven way, for example using a cross-validation approach that aims to minimize a mean squared error. Such an approach would be too time-consuming for our simulation study and is therefore omitted. Instead, we will use the same set of smoothing parameters across different sample sizes.

In an empirical application, Schennach and Hu (Citation2013) obtain standard errors using a bootstrap procedure. Such a procedure would be again be too time-consuming for our simulation study. In line with Schennach and Hu (Citation2013) and Garcia and Ma (Citation2017), we will therefore not report standard errors for the sieve-based estimates.Footnote3

If the quadratic errors-in-variable model contains error-free control variables as in (25), we condition all densities in (28) and (29) on these covariates. Because of the assumed independence, the densities f1 and f2 are not affected by this conditioning. Regarding f3, we adopt the two-step estimation approach proposed by Schennach and Hu (Citation2013, p. 184). We first estimate E(ξ|z) by regressing x on a constant and the vector of control variables z, yielding the estimated coefficient vector ζ̂. Subsequently, we replace f3*(ξ) in (29) by f3*(ξζ̂z) and f1*(yiαβξγξ2) by f1*(yiαβξγξ2zλ̂). We then proceed as above, but with the additional constraint that the sieve density f3* has mean zero.

5.2. Comparison to method of moments

compares the assumptions underlying our method-of-moments estimator MM1 and the sieve-based estimator of Schennach and Hu (Citation2013). Both estimators assume that ξ, v and ε are mutually independent. The main advantage of the method of Schennach and Hu (Citation2013) lies in its flexibility with respect to the functional form of g(ξ,τ), which does not have to be quadratic. The sieve-based approach is also relatively flexible with respect to the distribution of the measurement error, which is not required to be symmetric or free of excess kurtosis.

Table 2. Assumptions: method-of-moments vs. sieve-based estimation.

We note, however, that sieve densities will impose certain parametric restrictions in practice. This is due to the relatively low values of the numbers of terms sk that are usually selected in (30) for the sake of computational feasibility. Such restrictions are particularly relevant for the distributions of ξ and ε, on which the method of moments does not impose any assumptions. Hence, the sieve-based method will typically be more general in terms of the distribution of v, but less general regarding the distributions of ξ and ε.

If a vector of error-free control variables z is included in the quadratic measurement-error model as in (25), both approaches assume that (ξ,z), v and ε are mutually independent. The sieve-based approach additionally assumes that E(ξ|z)=zζ and that [ξE(ξ|z)]|z does not depend on z. Our method-of-moments estimators do not require such assumptions.

6. Simulation study

We use Monte Carlo simulation to assess the performance of MM1, the Wald test, the sieve-based approach and OLS. In all simulation experiments, we take n=500,2,000,3,000 and 5,000.

6.1. Normal measurement error

We start with the normal quadratic measurement-error model given by (1), with α=β=γ=1,εN(0,2),vN(0,0.2) and ξN(1,1). The model has an R2 of 0.85 and a reliability of 0.83.Footnote4

Because the measurement error in our simulation experiment is normally distributed, MM1 is consistent. The sieve-based approach of Schennach and Hu (Citation2013) is also consistent in this setting, provided that sk as n, k = 1, 2, 3. Because we use s1=s2=s3=6 regardless of the sample size, the empirical implementation of the sieve-based estimator is formally inconsistent. Because of the flexibility of the sieve densities even for relatively low values of the smoothing parameters, we still expect the resulting estimator to perform well in terms of bias and standard deviation. However, we expect MM1 to have a smaller bias and to be more efficient, since it does not rely on approximative distributions but exploits the assumptions of symmetry and zero excess kurtosis.

The upper panel of (“normal errors”) shows the results for MM1, the sieve-based approach and OLS. The rows captioned “bias” report the average value of the estimated parameter minus its true value. The rows captioned “s.d.” show the standard deviation of the estimated parameters, while the rows captioned “avg. σ̂” display the average estimated standard errors. These statistics are calculated as averages over all simulation runs for which the system of moment conditions has a unique solution. We verify the uniqueness of the solution by using different starting values for the root-solving routine. We confirm the existence of a unique solution in almost all simulation runs. Regardless of the sample size, the estimates of α, β, γ, σε2 and σv2 as produced by MM1 are on average close to their true values. Also the average formula-based standard errors are close to the sample standard deviations. Also for smaller sample sizes, MM1 usually turns out feasible.Footnote5

Table 3. Simulation results: normal errors.

The biases of the sieve-based estimators are small in an absolute sense but larger than those associated with MM1. Part of this difference in bias may be caused by our non-optimal choice of the sieve smoothing parameters sk. The biases of the sieve-based estimators of β and γ do not show the monotonic decrease with n that we may expect on the basis of the method’s known consistency. The pattern in the biases is also likely to reflect our fixed choice of smoothing parameters and emphasizes the need for a data-driven choice to get optimal results. As expected, the results in the upper panel of confirm that the OLS estimator is inconsistent.

We also consider the above quadratic errors-in-variables model with (demeaned) Poisson distributed regression errors. We choose the same regression error variance as before, which means that we set the Poisson parameter equal to 2. Because MM1 does not use the distribution of ε, we do not expect that this distributional change will substantially affect its performance. By contrast, the ability of the sieve estimator with low s1 to approximate the discrete distribution of ε could be affected. These expectations are confirmed by the results shown in the lower panel of . Especially the bias of the sieve-based estimator of α turns out relatively large for s1=s2=s3=6. The OLS bias continues to be large.

6.2. Non-normal measurement error

We take the same quadratic measurement error as before, but now with non-normal, symmetric measurement error. As before, we set α=β=γ=1,εN(0,2) and ξN(1,1), but choose v either Laplace distributed (leptokurtic) or continuous-uniformly distributed (platykurtic) with mean 0 and variance 0.2. Because zero excess kurtosis is a necessary condition for the consistency of MM1, this estimator will be inconsistent in these two cases. We expect the sieve-based estimator, based on s1=s2=s3=6, to be less inconsistent than MM1.

The estimation results are shown in upper and lower panel of . Regardless of the sample size, the biases of α̂,β̂ and γ̂ are less than 10% of the true parameter values. The biases of σ̂ε2 and σ̂v2 are more substantial, though. The inconsistency of the underlying approach becomes apparent from the biases’ lack of variation with n. As expected, the biases of the sieve-based estimators are small in an absolute sense. In comparison to MM1, however, the biases of the sieve-based estimates of α and γ are relatively large. As before, the biases of the sieve-based estimators do not show the convergence to zero that would be the case with optimal smoothing parameters. As expected, the bias of the OLS estimator is much larger than for the other two methods.

Table 4. Simulation results: non-normal symmetric measurement error.

We also consider the quadratic measurement-error model with non-symmetric measurement error. We use the same specification as used by Schennach and Hu (Citation2013) in their simulation experiments. We thus set α = 0, β=γ=1 and εN(0,0.9). Furthermore, ξ is a mixture of N(0,1) and N(0.2,0.25) random variables with weights 0.6 and 0.4, respectively. We take v (demeaned) minimum-Gompertz distributed, with parameters a=0.5772b and b = 1/2, where 0.5772 denotes the Euler-Mascheroni constant. Due to the substantial measurement-error variance of about 0.4, the reliability in this model is lower than in the previously considered normal model (0.64 versus 0.83). The model’s R2 is also lower than before and equals almost 0.70. Again we expect the sieve-based estimator, based on s1=s2=s3=5, to be less inconsistent than MM1.

The estimation results are shown in upper panel of (“Gompertz measurement error”). We observe that MM1 is more biased than in the previous simulation experiments. This holds particularly true for the estimators of γ and σv2. The increased bias is due to the combination of a leptokurtic, asymmetric measurement-error distribution and a reduced reliability.

Table 5. Simulation results: Gompertz errors.

We next consider the Gompertz measurement-error model with non-normal regression errors. The distribution of the regression errors is (demeaned) minimum-Gompertz, with parameters a=0.5772b for b = 3/4, while the remaining distributions and parameters are the same as before. Because MM1 does not rely on the distribution of ε, we would not expect this distributional change to affect its bias. The results in the lower panel of (“Gompertz errors”) confirm this. The ability of the sieve estimator with low s1 to approximate the distribution of ε could be affected, as it did before when we considered Poisson regression error. However, this time we do not observe the latter effect for the sieve-based estimators; the results in the lower panel of are very similar to those in the upper panel.

6.3. Error-free control variables

We consider two simulation experiments for the quadratic normal-measurement error models with a single error-free control variable z, such that zλ in (25) reduces to λz. In both simulation experiments, Assumption 3.1 (v) is satisfied. In the first experiment, we take α=β=γ=λ=1,εN(0,2),vN(0,0.2),zN(0.5,0.5) and ξ|zN(0.75z,0.75). In the second experiment, we choose α=λ=1,β=γ=0.5,εN(0,2),vN(0,0.2),ξN(1,1) and z|ξN(ξ,1), such that E(ξ|z) is non-linear. In both models, the R2 and the reliability have a value of 0.83.

Because the functional form of E(ξ|z) does not matter for the consistency of MM1, we expect good results for MM1 in both experiments. For the sieve-based approach (s1=s2=s3=6), the two-step approach described at the end of Section 5.1 will only be consistent in the first simulation experiment.

The estimation results in of Appendix C, supplementary material confirm our expectations. In both simulation experiments, the biases of MM1 are small in an absolute sense and vanish as n increases. In the first simulation experiment, the biases of the sieve-based estimators are also small, but larger than those associated with MM1. In the second experiment, the biases of the sieve-based estimators are much larger, both in an absolute sense and relative to MM1. For values of n larger than 500, the bias of the sieve-based estimator of β̂ is even larger than for OLS.

6.4. Wald test

Before turning to the performance of the Wald test in previously considered simulation experiments, we globally discuss the behavior of the auxiliary estimator MM2 in the simulations. We first observe that the underlying system of moment equations does not always yield a feasible solution that satisfies σ˜ε20,0σ˜v2σ̂x2 and π˜v6σ˜v4. Most of the time, infeasibility is caused by violation of the last constraint. In a small percentage of the simulation runs, there is no solution at all.Footnote6

Our simulation experiments confirm that MM2 is consistent, but show that it may turn out inefficient relative to MM1 if v is normal, depending on the coefficients of interest.Footnote7 Similarly, they confirm that if v is symmetric with non-zero excess kurtosis, MM2 is consistent, as opposed to MM1. In the asymmetric case, the relative performance of MM1 and MM2 is an empirical matter, since both estimators will usually be inconsistent. In the Gompertz case, our simulation results show that the biases of α̂,β̂ and γ̂ (MM1) are smaller than those of α˜,β˜ and γ˜ (MM2). However, MM2 produces a less biased estimate of σv2. As a possible explanation for this performance difference, we note that MM1 erroneously imposes πv=3σv4 (Assumption 3.1 (iv)) but does not assume E(v5)=0 (Assumption 3.1 (iii-b)), unlike MM2. Apparently, the former assumption is less detrimental to the consistent estimation of α, β and γ than the latter, while the opposite holds for σv2.

Our approach consists of running the Wald test whenever MM1 and MM2 both uniquely exist. This is virtually always the case in our simulations.Footnote8 reports the empirical rejection rates of our Wald test in each of the eight simulation experiments considered previously. In the four cases with normal measurement error, these rejection rates reflect the empirical size of the Wald test. We see that these rejection rates are close to nominal. The rejection rates for the other simulation experiments reflect the empirical power of the Wald test. With Laplace and uniform measurement error, the empirical power starts at a relatively low level and increases slowly with n. The low finite sample power arises from the fact that MM1’s bias is only small in the presence of symmetric measurement error with non-zero excess kurtosis and modest variance. Only moment condition (11) does not hold, which results in an estimator whose inconsistency is relatively modest. Because the bias of MM1 is only small in these symmetric cases, the low power of the test poses less of a practical problem here. In the two Gompertz cases, however, the inconsistency of MM1 is more severe. Here the Wald test’s empirical power is already high for n = 500 and reaches the value 1 quickly.

Table 6. Empirical size and power of Wald test.

6.5. Outlier sensitivity

Because the impact of extreme values on sample means is amplified by raising these large values to a power up and until order four (MM1) and five (MM2), our method-of-moments estimators could be sensitive to outliers. We investigate the outlier sensitivity by means of simulation. For this purpose, we return to the model of Section 6.1 with normal measurement-error and regression error. In this adjusted simulation experiment, both ξ and ε contain 25 randomly placed outliers. The fixed number of outliers implies that their presence becomes less of an issue as n grows, which seems a realistic setup. These outliers have a positive or negative sign with probability 0.5 and their fixed magnitude is qσξ2 and qσε2 (q = 2, 4, 5), respectively. To save space, the simulation results for MM1 are shown in of Appendix C, supplementary material.

MM1 still feasibly exists in most simulation runs, even for the smaller sample sizes. But we observe that the presence of outliers tends to increase the bias of the estimated coefficients. This holds true especially for σε2. Also the standard deviation and average formula-based standard error of σε2 increase substantially due to the presence of outliers. This effect becomes particularly apparent for q = 4, 5 and n = 500.

The simulation results reveal similar outlier effects for MM2 as for MM1.Footnote9 However, the percentage of simulation runs in which MM2 feasibly exists is relatively low for q = 4, 5 and n = 500. For example, if we take q = 5 and n = 500, then MM2 uniquely (feasibly) exists in 83.0% (18.3%) of the simulation runs. Further inspection shows that it is usually MM2’s infeasibility of σv2 that is a problem in these simulations. For MM1, the two percentages are both 99.1%. Hence, MM2 is more sensitive to outliers than MM1 in terms of feasibility. The simulation results in the Appendix, supplementary material additionally show that our Wald test exhibits more overrejection if the magnitude of the outliers increases.

We conclude that our method-of-moments approach requires us to remain alert for outliers, especially if the sample size is relatively small.

6.6. Empirical strategy

Under the assumption of Schennach and Hu (Citation2013) that “measurement error is not sufficiently severe to completely alter the shape of the specification,” we recommend considering our method-of-moments estimator as a potential candidate if OLS reveals a quadratic relation. Based on our analysis and simulations, we propose the following strategy to determine if MM1 should be used. If both MM1 and MM2 exist and the Wald test fails to reject, we recommend MM1 as the estimator of the quadratic errors-in-variables model. If the Wald test rejects, we recommend the approach of Schennach and Hu (Citation2013) instead. We also recommend the latter approach if either MM1 or MM2 does not exist. However, we advise to remain alert for possible misspecification in such cases, especially in the presence of error-free control variables z. That is, in the presence of such regressors, the method of Schennach and Hu (Citation2013) imposes certain assumptions on the conditional distribution of ξ given z; see the discussion in Section 5.2. Our simulations in Section 6.3 have shown that imposing these assumptions may lead to serious bias if they do not hold.

7. Empirical application

Our empirical application uses housing data from Harrison and Rubinfeld (Citation1978).Footnote10 This data set contains information on 506 geographical neighborhoods (census tracts) in the Boston Standard Metropolitan Statistical Area in 1970. The dependent variable of interest is the median value of the owner-occupied homes in the census tract. The average median value of the homes in the data set equals $22,523, with a standard deviation of $9,182.

We assume that there is a single unobserved regressor of interest, namely the percentage of the population in the census tract with a lower socio-economic status. This percentage is measured as the equally-weighted average of the percentage of adults without some high-school education and the percentage of male workers classified as laborers. On average, the observed percentage of lower status population equals 12.7%, with a standard deviation of 7.1%. We informally investigate the normality of the log of the observed percentage of lower status population by drawing a QQ plot; see the first graph in . The dashed line in this graph represents the 45 degree line, corresponding to the standard normal distribution. We observe some deviations from normality in the right tail, which is less heavy than in the normal case.

Figure 2. Boston housing data.

Notes: The QQ plot in the left-hand-side figure applies to the log of the observed percentage of lower status population (after standardization). The dashed line indicates the 45 degree line, corresponding to the standard normal distribution. In the right-hand-side figure, the open dots reflect the observed data, while the closed dots correspond to the OLS-based predicted (=expected) log housing value.

Figure 2. Boston housing data.Notes: The QQ plot in the left-hand-side figure applies to the log of the observed percentage of lower status population (after standardization). The dashed line indicates the 45 degree line, corresponding to the standard normal distribution. In the right-hand-side figure, the open dots reflect the observed data, while the closed dots correspond to the OLS-based predicted (=expected) log housing value.

We consider the quadratic measurement-error model specified by (32) log(y)=α+βlog(ξ)+γ[log(ξ)]2+zλ+ε;log(x)=log(ξ)+v,(32) where y is the median value of the owner-occupied homes in the census tract expressed in thousands of dollars, x the observed percentage of lower status population, ξ the true percentage of lower status population, v the measurement error and ε the regression error. The vector z=(z1,z2,z3,z4) includes four additional explanatory variables that were also used by Wooldridge (Citation2012): z1 is the average number of rooms per house, z2 the log of the nitric oxides concentration in parts per 10 million, z3 the log of the weighted distance in miles to five Boston employment centers and z4 the pupil-teacher ratio in the neighborhood. We assume these covariates to be free of measurement error. Detailed sample statistics for the dependent and explanatory variables are given in of Appendix D, supplementary material.

We proceed as in Sections 3.4 and 4.2 to obtain the method-of-moments estimator MM1 under Assumptions 3.1 (i) – (v). Subsequently, we run the Wald test to verify the assumption of no excess kurtosis. To obtain the sieve-based estimator, we follow the two-step approach outlined in Section 5.1 and make the required assumptions about the distribution of log(ξ)|z.

7.1. Benchmark approaches

As mentioned in the introduction, we maintain the assumption of Schennach and Hu (Citation2013) that “measurement error is not sufficiently severe to completely alter the shape of the specification.” On the basis of OLS and the Bayesian Information Criterion (BIC), we conclude that we have to include log(ξ) and [log(ξ)]2 to parsimoniously capture the relation between log(y) and log(ξ) but that higher-order terms are not required.Footnote11 We therefore continue with the model that has the lowest BIC value, which is the model with linear and quadratic terms, but no cubic terms. The corresponding OLS estimation results are shown in the left-most panel of .

Table 7. Estimation results for the Boston housing data.

We observe that the OLS estimate of γ is significantly negative according to the 90% bootstrap-based confidence interval that is reported in . As a result, the estimated relation between the expected log housing value and the log percentage of lower status population is described by a parabola that opens downwards. For each observation, we display the OLS-based predicted log housing value in the second graph of , together with a scatter plot of log(x) and log(y). The curve in shows that we expect lower log housing values for neighborhoods with a higher log percentage of lower status population.

We use the 5% and 95% sample quantile of x to determine a relevant range of values for ξ. For this range of values, we obtain the OLS-based elasticity of y with respect to ξ; i.e., the marginal effect of log(ξ) on log(y). We visualize these results in and observe that housing values are inelastic in all neighborhoods.

Figure 3. Comparison of different estimators.

Notes: For each method, the solid lines in the left-hand-side figure show the estimated elasticity of the housing value with respect to the percentage of lower status, plotted as a function of the log percentage of lower status. The dashed lines constitute the corresponding 90% pointwise confidence interval based on a bootstrap with replacement. The log percentage of lower status population is taken between the 5% and 95% quantile of the observed log percentage of lower status population. In the right-hand-side figure, the three methods are compared in terms of the predicted log housing value (in $1,000) as a function of the log percentage of lower status. Here, the additional control variables have been set at their sample medians.

Figure 3. Comparison of different estimators.Notes: For each method, the solid lines in the left-hand-side figure show the estimated elasticity of the housing value with respect to the percentage of lower status, plotted as a function of the log percentage of lower status. The dashed lines constitute the corresponding 90% pointwise confidence interval based on a bootstrap with replacement. The log percentage of lower status population is taken between the 5% and 95% quantile of the observed log percentage of lower status population. In the right-hand-side figure, the three methods are compared in terms of the predicted log housing value (in $1,000) as a function of the log percentage of lower status. Here, the additional control variables have been set at their sample medians.

Subsequently, we estimate the quadratic measurement-error model using the sieve-based approach. In line with our simulation experiments, we use a trial-and-error procedure to determine the sieve smoothing parameters. This procedure entails that the values of the sks are increased until the resulting maximum likelihood estimates do not change any further, which yields the values s1=s2=s3=4. In line with Schennach and Hu (Citation2013), we use a standard bootstrap with replacement to obtain the corresponding standard errors. The estimation results are shown in the middle panel of . According to the sieve-based method, the estimate of γ is significantly negative. We next calculate the estimated elasticity of y with respect to ξ and visualize the results in the first graph of . We observe that housing values are either inelastic (low and medium percentage lower status) or unit elastic (very high percentage lower status).

According to both OLS and the sieve-based approach, the estimated coefficients of the additional control variables are significant and have the expected signs. However, on the basis of our simulation experiments, we note that some caution is required here. We seen that the bias of the OLS estimator can be substantial in the presence of measurement error. Furthermore, the sieve-based approach assumes that E(log(ξ)|z)=zζ and that [log(ξ)E(log(ξ)|z)]|z does not depend on z. Our simulation results have shown that erroneously imposing these assumptions may also induce substantial bias.

7.2. Method of moments

Lastly, we use MM1 to estimate the quadratic measurement-error model. The system of moment equations has a unique and feasible solution. Estimation results, including bootstrap-based standard errors, are reported in the right-most panel of . The estimate of γ is significantly negative. Furthermore, the estimate of the measurement-error variance is σ̂v2=0.064, which translates into a reliability of 82%. Our empirical strategy recommends us to perform the Wald test, based on the auxiliary method-of-moments estimator. The latter estimator turns out to have a unique solution. We use a bootstrap-based version of the Wald test, which yields a p-value of 0.494. Hence, our test provides no evidence against the consistency of MM1.Footnote12

The first graph in visualizes the estimated elasticity of y with respect to ξ as a function of log(ξ), from which we conclude that housing values are inelastic (low percentage lower status), unit elastic (medium percentage lower status) or elastic (high percentage lower status). For medium to high percentages of the lower status population, the OLS-based elasticity curve lies significantly above the one based on MM1, in the sense that the former curve does not fall within the confidence bounds of the latter. The elasticity curve based on the sieve-based method falls in between the other two curves, but is relatively close to the OLS-based curve. The difference between the elasticity curves produced by OLS and MM1 is consistent with the effect of attenuation on the OLS estimates, as discussed in Section 2. To illustrate more directly that the graph based on MM1 has more curvature, the second graph in displays the predicted (=expected) value of log(y) as a function of log(ξ) for each of the three estimators.

According to MM1, the number of rooms has a significantly negative marginal effect, which seems counter-intuitive. We first investigate whether this finding is due to outliers, since our simulation results have shown that outlier sensitivity may be an issue for smaller sample sizes. We winsorize the dependent and explanatory variables at the 95% level and re-estimate the model. This adjustment leads to very little change in the sign, magnitude and significance of the estimated coefficients, suggesting that the counter-intuitive finding is not due to outliers.Footnote13 We provide two alternative explanations. First, Sirmans et al. (Citation2005) and Zietz et al. (Citation2008) address the insignificant or significantly negative coefficients of the number of (bed- or bath-)rooms that have shown up in certain studies. Zietz et al. (Citation2008) argue that particular housing characteristics are priced differently for houses in the upper-price range as compared to houses in the lower-price range and recommend quantile regression to deal with this variation in pricing. Hence, the significantly negative sign of the coefficient estimate of the number of rooms may indicate that the standard quadratic location-shift regression model that we adopted is too restrictive. We refer to Chesher (Citation2017) for a discussion of the effect of measurement error on the estimation of quantile regression functions. Consistent estimation of the quantile regression model in the presence of measurement error is also discussed in Schennach (Citation2008) and Wei and Carroll (Citation2009). We note, however, that these studies make use of side information in the form of instrumental variables and replicated measurements, respectively. A second possible explanation for the counter-intuitive sign is endogeneity due to simultaneity or omitted variables. Such a situation would require an approach that can deal with both measurement error and additional sources of endogeneity; see, e.g., Hu et al. (Citation2015), Song et al. (Citation2015) and Hu et al. (Citation2016).

8. Discussion

This study has proposed a new consistent estimator for the quadratic errors-in-variables model, based on exploiting higher-order moment conditions. Our approach assumes a symmetric measurement-error (ME) distribution without excess kurtosis, but does not require any side information, such as a known measurement error variance, replicated measurements, or instrumental variables. We straightforwardly allow for one or more error-free control variables, which only requires the standard assumption that these regressors are independent of the measurement and regression errors. We have combined our estimator with a Wald-type statistical test to verify a necessary condition for its consistency.

Under the assumption that the measurement error does not alter the shape of the specification, we recommend considering our method-of-moments estimator as a potential candidate if OLS reveals a quadratic relation. On the basis of our theoretical analysis and simulation study, we recommend our estimator “MM1” as the final choice if the Wald test fails to reject. Especially if the sample size is small, we advise to investigate the sensitivity of the estimation results to outliers in the data.

We mention a few directions for future research. Instead of using our Wald test to choose between MM1 (symmetric ME with zero-excess kurtosis) and MM2 (symmetric ME), we may want to consider a different approach to obtain our final estimator. Because MM2 – unlike MM1 – is consistent even in the presence of excess kurtosis, an alternative possibility is to discard MM1 altogether and to resort to MM2 in all cases. Our simulation results have illustrated that this strategy does not necessarily lead to an estimator with a smaller bias or a lower variance, though. Furthermore, MM2 turned out relatively sensitive to outliers and small samples in terms of feasibility. Alternatively, we could resort to an approach that minimizes the final estimator’s Mean Squared Error (MSE). Methods such as shrinkage or model averaging could be used to strike an optimal balance between bias and variance. For a practical implementation of the latter approach, we refer to Lavancier and Rochet (Citation2016). The latter study discusses a method to average different estimators of which at least one is consistent in order to reduce the MSE of the final estimator. We note that, in the absence of symmetry, both MM1 and MM2 will typically be inconsistent. As a result, the benefits of model-averaging remain theoretically unclear (Lavancier and Rochet, Citation2016). Preliminary estimation results in Lavancier and Rochet (Citation2016) for a specific example show that the model-averaging approach is robust to model misspecification, but further research would be required to extrapolate this conclusion to our method-of-moment estimators.

The quadratic model is the natural first extension of the linear model, and arguably the most common extension used in practice. In principle, our approach could be extended to higher-order polynomials, but this would require fitting moments of a very high order, which would often lead to unacceptably large sampling variability. For other functional forms, it may be more natural to transform the error-ridden covariate first and assume additive measurement error on the transformed scale, similar to what we did in our empirical application.

We mention two other directions for future research. The first is the consistent estimation of the quantile regression model in the presence of measurement error and in the absence of any side information, as suggested by our empirical application. The second direction for future research is to relax the homoscedasticity implied by the independence assumption, which is often at variance with economic reality. We can extend our approach to handle heteroscedasticity, but only so at the cost of using moments of an order well beyond four. This requires enormous sample sizes and is hence not attractive. An alternative is to go back to earlier literature and express the heteroscedasticity as a parametric function of the regressors. In our case, this would involve the unobserved regressor, cf. Meijer and Mooijaart (Citation1996) and Meijer (Citation1998, Ch. 4). This option seems feasible, but our approach then evidently loses its relative simplicity. We emphasize, though, that this limitation is not unique to our approach (e.g., Garcia and Ma, Citation2017); heteroscedasticity remains a difficult issue to deal with and there is no simple escape by just using robust standard errors.

Acknowledgments

We thank participants of the 2018 Meeting of the Netherlands Econometric Study Group held at the University of Amsterdam for discussion and suggestions. We also thank Susanne Schennach and Yingyao Hu for discussions about their method and for sharing their Matlab code.

Laura Spierdijk gratefully acknowledges financial support from a Vidi grant (452.11.007) in the “Vernieuwingsimpuls” program of the Netherlands Organization for Scientific Research (NWO). Her work was also supported by the Netherlands Institute for Advanced Study in the Humanities and Social Sciences (NIAS-KNAW). The usual disclaimer applies.

Disclosure statement

No potential conflict of interest was reported by the authors.

Correction Statement

This article has been republished with minor changes. These changes do not impact the academic content of the article.

Notes

1 See also Chan and Mak (Citation1985), Moon and Gunst (Citation1995), Wolter and Fuller (Citation1982), Buonaccorsi (Citation1996), Cheng and Schneeweiss (Citation1998), Cheng and Van Ness (Citation1999) and Cheng et al. (Citation2000).

2 Similar to the previous independence relaxation, the assumption that (ξ,z) is independent of both v and ε can be relaxed to covariance restrictions of the form Cov (zjk,x1)=0 for suitable values of k, and with x1{v,ε},j=1,,K.

3 More details of the computational implementation of the sieve-based approach are given in Appendix B.

4 The R2 is defined as R2Var(α+βξ+γξ2)/Var(y)=A/(A+σε2), for Aβ2Var(ξ)+γ2Var(ξ2)+2βγCov (ξ,ξ2).

5 Appendix C shows the exact percentage of simulation runs for which a unique (feasible) solution exists; see Table C.1.

6 The exact percentage of simulation runs with a unique (feasible) solution is shown in Appendix C, supplementary material; see Table C.1.

7 Detailed simulation results for MM2 are provided in Tables C.4 – C.7 in Appendix C, supplementary material.

8 This is shown in Appendix C; see Table C.1, supplementary material.

9 The simulation results for MM2’s outlier sensitivity can be found in Appendix C; see Table C.8, supplementary material.

10 In our empirical analysis, we use the data set BostonHousing2 from the R package mlbench; see https://search.r-project.org/CRAN/refmans/mlbench/html/BostonHousing.html and Gilley and Pace (Citation1996).

11 The values of the BIC in the models with only linear terms, linear and quadratic terms and linear, quadratic and cubic terms are −100.9671, −119.9552 and −119.1272, respectively.

12 Detailed estimation results for MM2 are provided in Table D.2 in Appendix D, supplementary material.

13 We do not report the estimation results after winsorization, because they are very similar to those in Tables 7 (MM1) and D.2 in Appendix D, supplementary material (MM2).

References