349
Views
11
CrossRef citations to date
0
Altmetric
Original Articles

When Space Matters: Spatial Dependence, Diagnostics, and Regression Models

Pages 117-135 | Published online: 06 Apr 2010
 

Abstract

For researchers analyzing geographically oriented data, such as neighborhood or county crime rates, a serious threat to validity is the violation of OLS regression assumptions due to the spatial clustering of values for the variables in the analysis. In such circumstances, OLS regression estimates may be biased and inefficient. However, analytic methods for spatial data analysis allow formal assessments of spatial autocorrelation, as well as maximum likelihood spatial regression models. This paper presents discussions of the underlying logic of spatial effects, the impact of spatial dependence on non‐spatial regression models, methods for assessing spatial dependence, the implementation of spatial regression models, and an illustration of the techniques presented. Additional discussion and references to more advanced spatial analytic techniques are provided.

Notes

1. This paper is intended to be an introductory reference for researchers interested in spatial data analysis. To that end, I restrict my general discussion to cross‐sectional data analysis. More advanced techniques, discussed elsewhere, are available when estimating longitudinal or hierarchical models with spatial dependence. I will not discuss these techniques, but refer the reader to recent papers on the subject below. All examples presented are estimated using GeoDa statistical software, available at http://geodacenter.asu.edu/.

2. I will use the terms spatial dependence and spatial autocorrelation interchangeably throughout this text. However, it should be noted that spatial autocorrelation is a weaker form of strict spatial dependence.

3. Bishop contiguity is a third alternative to rook and queen contiguity in which neighbors are defined as those sharing a common vertex (Anselin Citation1988). However, in practice rook and bishop contiguity structures are used less often than queen contiguity definitions.

4. For alternative methods of assessing spatial autocorrelation, refer to Cliff and Ord (Citation1973, Citation1981). Global measures of spatial autocorrelation such as Moran’s I and Geary’s C refer to the extent of spatial patterning for an entire system of areal units (i.e., a lattice system). See Cressie (Citation1993) for methods appropriate for distance‐based weights. Additionally, there are local indicators of spatial autocorrelation (LISA) that describe the contribution of each observation to the global measure (Anselin Citation1995; Getis and Ord Citation1992; Ord and Getis Citation1995).

5. Technically, µ does not need to be homoskedastic. Anselin (Citation1988) demonstrates spatial dependence models in which this assumption may be relaxed such that the error variance may be heteroskedastic. This would imply a situation of spatial heterogeneity, in which the structural model presented in Equation Equation1 may have different coefficient estimates across units in geographic space (see Anselin Citation1988; Baller et al. Citation2001). However, to simplify explanation here, I impose the assumption constant error variance.

6. Anselin (Citation1988) demonstrates that the spatial weights matrix W for ρ does not need to be equivalent to the W for λ. As with the assumption of homoskedastic errors, I impose this restriction to ease illustration of the model. However in practice, the assumption that the W are equivalent is often implicitly made as researchers rarely use two different weight matrixes.

7. One of the difficulties of estimating spatial dependence models is the inducement of heteroskedasticity. This results as a function of imposing structure on the data through the spatially weighted parameters. It should also be noted that this issue influences not only the spatial error model, but also the spatial lag model. As seen in Equation Equation5, since the outcome y is estimated as a function of a spatial multiplier process, the error term is also influenced by the same process. Thus, inducing heteroskedasticity in a spatial lag model is possible as well (Anselin Citation1988).

8. The city of St. Louis had 113 census tracts in the 2000 census. However, three tracts had no population thereby precluding the computation of crime rates, as well as the measurement of structural covariates obtained from census data.

9. The analysis here is intended to demonstrate the use of spatial dependence models, rather than constitute a formal test of social disorganization theory. Considerable research has expanded and elaborated on the original Shaw and McKay (Citation1942) model to include intervening mechanisms such as collective efficacy, and account for spatial processes (see Bursik and Grasmick Citation1993; Morenoff et al. Citation2001; Sampson and Groves Citation1989; Sampson, Raudenbush, and Earls Citation1997).

10. Heterogeneity is measured as , where is the squared proportion of the tract population across each of five racial categories: Whites, Blacks, Asian/Pacific Islander, Native American/Aleutian, and Others (Blau Citation1977; Britt Citation2000).

11. Results were replicated using first‐order rook weights, as well as 5‐nearest and 10‐nearest neighbors. Since the specification of the correct neighbor structure may be debatable, replication of analyses under a variety of weighting structures is strongly suggested.

12. Significance levels for Moran’s I are based on a permutation process and normal approximation of I as a z‐score (Anselin et al. Citation2000; Cliff and Ord Citation1981).

13. In the OLS regression model, a Breusch‐Pagan test for heteroskedasticity yielded a statistic of 26.36 (p < .001). However, when the spatial lag of homicide rates was included in the model, the Breusch‐Pagan test statistic was 6.47 (p < .05). Thus, the inclusion of the spatial lag improved the model specification, but did not eliminate heteroskedasticity in the error term. As discussed above, some heteroskedasticity may be induced by the imposition of a specific spatial structure.

14. The Breusch‐Pagan test for the model in Table is 201.75 (p < .001), indicating that significant heteroskedasticity remains even with the additional covariates. Additional analyses would be undertaken to identify the source of the mis‐specification and estimate a model conforming to the regression assumptions. However, that analysis is beyond the scope of the current paper.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 348.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.