Abstract
The political, social, and economic conditions which lead to inequality, poverty, and health disparities have distinct spatial footprints. Geographic information systems (GIS) are a collection of tools that can aid the social economist in the investigation of such phenomena. Geocoding is a technical procedure that matches attribute data to spatial features in GIS. This analysis of social economy discusses the possibilities of spatial analysis and the technical process of geocoding. Using 15 years of address level pediatric data from a local children’s hospital, a novel iterative geocoding process is applied for the purpose of investigating the relationship between household environments and health outcomes. This procedure adheres to traditional standards for geocoding (positional accuracy, completeness, and repeatability) while producing multiple spatial data sets that can be associated with a range of environmental and socioeconomic variables related to human health. Describing this technical procedure contributes to a growing methodological toolbox for applying GIS to research in social economics.
Notes
1 Shapefiles are the class of files used in Geographic Information Systems.
2 For example, a best practice from economic analysis is to use as much of the available data as possible. Our process is designed with this in mind.
3 Absolute upward mobility is defined by, “the fraction of children who earn more than their parents” (Chetty, Grusky, et al., Citation2016).
4 The parcel centroid is the central point in a parcel. It is the default location for a point shapefile in the geocoding process.
5 The centerline geography is a linear mapping of an area’s streets. In addition to being reference data for geocoding, the centerline geography is often used for network analysis. One example application of centerline networking is the development of routes for emergency vehicles.
6 In practice, this causes secondary addresses (i.e., 2002 ½ Main St.) associated with a parcel to fail to match.
7 The geocoding process described is designed to be extended to 238,756 observations of pediatric lead poisoning and 1,172,606 incidents of childhood injury.
8 This provides a consistent benchmark across different years and different surveys.
9 The iterative process is designed for repeatability and accuracy first, with completeness following from them. The absence of the “legal” field in 2000 and 2001 disables Iteration 4 of the match process.
10 The redlining data is drawn from the Mapping Inequality: Redlining in New Deal America project at the University of Richmond’s Digital Scholarship Lab.
11 The psuedo p-value was calculated based on 999 permutations.