ABSTRACT
This paper proposes a new estimation procedure for the first-order spatial autoregressive (SAR) model, where the disturbance term also follows a first-order autoregression and its innovations may be heteroscedastic. The estimation procedure is based on the principle of indirect inference that matches the ordinary least squares estimator of the two SAR coefficients (one in the outcome equation and the other in the disturbance equation) with its approximate analytical expectation. The resulting estimator is shown to be consistent, asymptotically normal and robust to unknown heteroscedasticity. Monte Carlo experiments are provided to show its finite-sample performance in comparison with existing estimators that are based on the generalized method of moments. The new estimation procedure is applied to empirical studies on teenage pregnancy rates and Airbnb accommodation prices.
ACKNOWLEDGEMENTS
The authors are grateful to two anonymous referees, a co-editor and the editor-in-chief (Paul Elhorst) for their helpful comments. Jeff Ello from the Krannert Computing Center at Purdue University kindly created a virtual machine from a computer cluster to facilitate the simulations conducted in this paper. The authors are responsible for all remaining errors.
DISCLOSURE STATEMENT
No potential conflict of interest was reported by the authors.
Notes
1 Two closely related papers are Liu and Yang (Citation2015) and Breitung and Wigger (Citation2018). They redefined the score function of the log-likelihood function such that the resulting moment conditions are in fact robust to heteroscedasticity and distributional assumptions.
2 The major difference between them is that the binding function in Kyriacou et al. (Citation2019) comes from approximating the expectation of the ratio that defines the OLS estimator of the SAR parameter by the ratio of expectations, but in Bao et al. (Citation2020) it is approximated such that one takes only the expectation of the numerator. In the end, the SAR parameter appears in both the numerator and denominator of the sample binding function in Kyriacou et al. (Citation2019) and it appears only in the numerator in Bao et al. (Citation2020). The primitive condition on the invertibility of the binding function in Kyriacou et al. (Citation2019) then seems to be more restrictive.
3 Kyriacou et al. (Citation2017) used as the correction term for
for the SAR(1) model. This makes the asymptotic variance of the recentred
more complicated and it involves the kurtosis of the disturbance term under homoscedasticity.
4 It is beyond the scope of this paper to list a set of primitive conditions to ensure the existence and uniqueness of the root for any given sample. It will depend on the structure of the data matrix, the characteristics of the weight matrices and the parameter space. For a given sample, however, one can always plot the binding function against
to verify numerically validity of this assumption.
5 This follows similarly from Proposition 2 of Lin and Lee (Citation2010).
6 While other choices are possible, in this paper, for the GMM estimator of Jin and Lee (Citation2019), the vector of moment conditions is where
(where
denotes the part of
without the constant term),
and
; for the GS2SLS estimator of Kelejian and Prucha (Citation2010), the matrix of IV is
in the first step and
with
and
is used as the moment conditions in the second step. With such choices of the moment conditions, the GMM and GS2SLS estimators are robust to heteroscedasticity. For both, the optimal two-step GMM estimation is used. The GEL estimator in Jin and Lee (Citation2019) is not considered in this paper, as it is much more computationally intensive and also it was shown in Jin and Lee’s Monte Carlo studies that the improvement over GMM was marginal.
7 As a referee pointed out, under a SARAR(1,1) specification, puts at risk the identification of the spatial autoregressive parameters when their true values are near zero. This is not the case in the simulation set-up though.
8 The authors thank an anonymous referee for suggesting including negative spatial autoregressive parameters in the simulations.
9 The authors thank the editor-in-chief and an anonymous referee for suggesting this line of discussion.
10 When , the GS2SLS fails (in terms of the optimization routine in Matlab R2020a that is used in conducting numerical estimation in this paper) more than 50% of the time, but the GMM and II rarely fail. Under
and
, all the three methods have virtually zero failing rate. reports the simulation results with successful optimizations for each estimator.
11 Observations that can be made from these additional results are that as the weight matrices become denser, all the three estimators perform less reliably in small samples and that the II estimator usually performs relatively better among the three, but it may become more problematic in estimating accurately when each spatial unit has more neighbours.
12 The authors are grateful to Xu Lin for providing the teenage pregnancy rate data.
13 The sample is retrieved from a third-party website (http://insideairbnb.com/) that provides data collected from publicly available information at https://www.airbnb.com/. The sample contains 2247 accommodation offers in Asheville on 21 March 2020, including 1728 entire homes/apartments and 519 private rooms. Since only 10 shared rooms were available in Asheville on 21 March 2020, they are excluded from the sample.
14 With the choice of and
(see note 6), the (two-step optimal) GMM fails numerically. Instead, four quadratic moment conditions are used for the GMM estimator:
and
,
.