1,573
Views
29
CrossRef citations to date
0
Altmetric
Methods, Models, and GIS

Validating Population Estimates for Harmonized Census Tract Data, 2000–2010

, &
Pages 1013-1029 | Received 01 Aug 2015, Accepted 01 Apr 2016, Published online: 17 Jun 2016
 

Abstract

Social scientists regularly rely on population estimates when studying change in small areas over time. Census tract data in the United States are a prime example, as there are substantial shifts in tract boundaries from decade to decade. This study compares alternative estimates of the 2000 population living within 2010 tract boundaries to the Census Bureau's own retabulation. All methods of estimation are subject to error; this is the first study to directly quantify the error in alternative interpolation methods for U.S. census tracts. A simple areal weighting method closely approximates the estimates provided by one standard source (the Neighborhood Change Data Base), with some improvement provided by considering only area not covered by water. More information is used by the Longitudinal Tract Data Base (LTDB), which relies on a combination of areal and population interpolation as well as ancillary data about water-covered areas. Another set of estimates provided by the National Historical Geographic Information Systems (NHGIS) uses data about land cover in 2001 and the current road network and distribution of population and housing units at the block level. Areal weighting alone results in a large error in a substantial share of tracts that were divided in complex ways. The LTDB and NHGIS perform much better in all situations but are subject to some error when boundaries of both tracts and their component blocks are redrawn. Users of harmonized tract data should be watchful for potential problems in either of these data sources.

社会科学家在研究小型地区随着时间的变迁时,习惯仰赖人口估计。美国人口普查单位的数据便是最佳的案例,因为普查单位的边界,每十年皆有着显着的改变。本研究比较相对于人口普查局本身重新列表的居住于2010年普查单位边界内的2000年人口之各种替代式估计。所有的估计方法皆不免有误;本研究则是第一个对美国人口普查单位的替代式内插法之错误直接进行量化的研究。一项简易的地区加权方法,严密地估算由单一标准来源(邻里变迁数据集)所提供的估计,并透过仅考量未被水体覆盖的地区而获得若干改进。纵向普查单位数据集(LTDB)则使用更多的信息,并仰赖面积和人口的内插法之结合,以及有关水体覆盖地区的辅助数据。另一个由全国历史地理信息系统(NHGIS)所提供的估计组,则使用2001年的土地覆盖,以及目前的路网和人口分布与街廓层级的住宅单位数据。仅使用面积加权本身,导致了过去以复杂的方式切割的调查单位中的大部份出现重大错误。LTDB和NHGIS在所有的情况下皆表现较佳,但当两者的调查单位边界及其组成街阔进行重划时,则仍产生若干错误。统一的调查单位数据之使用者,应该注意上述两者中任一个数据来源的潜在问题。

Con frecuencia, cuando estudian el cambio en áreas pequeñas a través del tiempo, los científicos sociales tienen que depender de estimativos de la población. Los datos censales de los Estados Unidos por secciones son un buen ejemplo al respecto, en cuanto que se presentan cambios sustanciales en los límites de las secciones censales de década en década. Este estudio compara los estimativos alternativos de la población del 2000 que habita dentro de los límites de las secciones del 2010 de la propia retabulación de la Oficina del Censo. Todos los métodos de cálculo están sujetos a error; este es el primer estudio que cuantifica directamente el error de los métodos alternativos de interpolación en las secciones censales de los EE.UU. Un simple método de ponderación espacial aproxima muy de cerca los estimativos entregados por una fuente estándar (la Base de Datos de Cambio Vecinal), con alguna mejora lograda al considerar tan solo el área no cubierta por agua. Mayor información es usada por la Base de Datos de Secciones Longitudinales (LTDB, por la sigla en inglés), que depende de una combinación de interpolación espacial y poblacional lo mismo que de datos complementarios acerca de las áreas cubiertas con agua. Otro conjunto de estimativos suministrado por los Sistemas de Información Geográfica Históricos Nacionales (NHGIS) usa datos sobre la cubierta del suelo en 2001 y la actual red de carreteras y distribución de la población, y unidades de vivienda a nivel de manzana. La sola ponderación espacial da lugar a error mayor en una parte sustancial de las secciones censales que fueron divididas de maneras complejas. Los LTDB y NHGIS se desempeñan mucho mejor en todas las situaciones, pero están sujetos a cierto grado de error cuando se rediseñan los límites de las secciones y de las manzanas que las componen. Los usuarios de datos armonizados de las secciones censales deben estar atentos a problemas potenciales en cualquiera de estas fuentes de datos.

Funding

We gratefully acknowledge funding support from the Russell Sage Foundation's US2010 Project and from the Population Studies and Training Center at Brown University, which receives core support from the National Institute of Child Health and Human Development (5R24HD041020, 5T32HD007338).

Notes

1. Information about the LTDB can be found from Brown University: http://www.s4.brown.edu/us2010/Researcher/Bridging.htm. Information about NCDB is available from Geolytics: http://www.geolytics.com/USCensus,Neighborhood-Change-Database-1970–2000,Products.asp. The NHGIS methodology is summarized at https://www.nhgis.org/documentation/time-series/2000-blocks-to-2010-geog

2. For methodological details, the Geolytics Web page refers users to the documentation of the NCDB's approach to the 1990 to 2000 estimates (http://www.geolytics.com/pdf/Appendix-J.pdf). Many blocks were reconfigured between censuses, and NCDB used ancillary data from the streets coverage from Tiger/Line 1992 to bridge 1990 data to 2000 tract boundaries (Tatian Citation2003). This is an excellent methodology, but the estimation discrepancies shown here are larger than would be expected if the street network had been used as ancillary data.

3. Data are reported for 72,205 tracts of the 73,057 total tracts in 2010 in the fifty states and District of Columbia. A total of 318 tracts with no land area in 2010 are omitted, although many of these have estimated populations in the NCDB. Of the remaining tracts, 534 were affected by the Census Bureau's Count Question Resolution (CQR) program that resulted in revised population counts of more than 0.1 percent for 2000. In many of these cases, a large group quarters population was shifted from one tract to an adjacent one. The CQR cases are omitted from the analysis because these changes were not available at the time that the LTDB was completed.

4. It takes eleven digits to represent a census tract ID. The first two digits represent the state FIPS code, the next three digits represent the county FIPS code, and the last six digits are the census tract's FIPS code. In the text, after the first mention of a tract, we abbreviate to the last six digits.

5. The comparison of RMSE between the NCDB and where tracts in 2000 methods seems paradoxical. suggest the discrepancies greater for the NCDB for unchanged tracts, but reports values of RMSE for NCDB and areal interpolation including water areas that are identical. This is possible because although NCDB has a higher share of cases with large errors (over 10 percent or larger than 500) the errors are actually larger in magnitude for areal interpolation including water areas (and these deviations are squared in the RMSE). We attach no theoretical or substantive significance to this result.

6. The LTDB provides a “backwards crosswalk” to estimate 2010 population data in 2000 tract boundaries that can be used for this purpose. It is available at http://www.s4.brown.edu/us2010/Researcher/LTDB2.htm.

7. The Census tract population change file is available at https://www.census.gov/population/metro/data/c2010sr-01patterns.html

Additional information

Notes on contributors

John R. Logan

JOHN R. LOGAN is Professor of Sociology at Brown University, Providence, RI 02912. E-mail: [email protected]. He directed the US2010 Project through which this research was originally supported, and his research interests include contemporary and historical residential and labor market patterns in U.S. cities, urban change in China, and school segregation.

Brian J. Stults

BRIAN J. STULTS is Associate Professor in the College of Criminology and Criminal Justice at Florida State University, Tallahassee, FL 32306. E-mail: [email protected]. His recent work addresses racial differences in arrest rates and variation in police force size as a result of perceived threat, fear, and prejudice.

Zengwang Xu

ZENGWANG XU is Assistant Professor in the Department of Geography at the University of Wisconsin, Milwaukee, WI 53201. E-mail: [email protected]. His primary interests are to investigate the relation between persistent system-level patterns and individual-based processes and the effect of spatiality on the structure and function of evolving complex spatial networks and systems.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.