414
Views
6
CrossRef citations to date
0
Altmetric
Research Article

Recalculating ... : How Uncertainty in Local Labour Market Definitions Affects Empirical Findings

, & ORCID Icon
 

ABSTRACT

This paper evaluates the use of commuting zones as a local labour market definition. We revisit the seminal paper by Tolbert and Sizer and demonstrate the sensitivity of definitions to two features of the methodology: a cluster dissimilarity cut-off, or the count of clusters, and uncertainty in the input data. We show how these features impact empirical estimates using a standard application of commuting zones and an example from related literature. We conclude with advice to researchers on how to demonstrate the robustness of empirical findings to uncertainty in the definition of commuting zones.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1 For surveys and summaries of local labour market definitions and clustering methods, see Casado-Díaz and Coombes (Citation2011) and Franconi, Ichim, and D’Aló (Citation2017). For a recent implementation of the TS method in Portugal, see Afonso and Venâncio (Citation2016).

2 Niedzielskia, Horner, and Xiao (Citation2013) investigate MAUP issues in commuting flows, but focus on sensitivity of summary measures (for excess commuting), rather than a multivariate analysis as with the studies cited above. Other studies also consider the optimal definition of traffic analysis zones, but that is a much smaller spatial scale and a different focus than analysis of local labour markets. Afonso and Venâncio (Citation2016) report having checked for robustness of results with a lower average-linkage commuting flow threshold.

3 Briant, Combes, and Lafourcade (Citation2010) evaluate an area defined by commuting flows, but for comparison, they use either arbitrary spatial units of the same size or larger spatial units defined using different criteria.

4 ERS released commuting zone definitions based on 1980, 1990, and 2000 commuting data. All three definitions are available at http://www.ers.usda.gov/data-products/commuting-zones-and-labour-market-areas.aspx as of 2020–09-30. For an analysis of the historical methodology, including the use of expert opinion, see Fowler, Rhubart, and Jensen (Citation2016).

5 Journey to Work data on county to county commuting flows are available for the 1990 Census, the 2000 Census, and 5-year samples of the ACS at https://www.census.gov/hhes/commuting/data/commutingflows.html.

6 Employment status is based on responses to the question ‘Did this person work at any time LAST WEEK?’ Place of work is geocoded using the response to ‘At what location did this person work LAST WEEK?’ Residence location is compiled from the mailing frame of the Census.

7 The hierarchical clustering for this paper using PROC CLUSTER was generated using SAS software, Version 9.2 of the SAS System for Unix. Copyright © 2009 SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc., Cary, NC, USA.

8 To visually illustrate how the clustering algorithm works, Appendix displays the results of the clustering procedure for different values of H, focusing just on counties in California. In the top left-hand corner, at a height of H=0.80, only a few counties have joined. As we increase the height to 0.88 and to 0.96, more counties are joined together. Finally, at a height of 1, almost all the counties have merged together, forming one large cluster and a few much smaller clusters.

9 We find that the clustering algorithm, when attempting to produce the same cluster count as TS1990 with a national run, retains a residual cluster that spreads across many states. Only with the lower cluster height, and more clusters, does the residual cluster break up. We leave evaluation of alternate clustering methods for future work, with the emphasis in the present analysis on the sensitivity of estimates to zone definitions. We also note that this residual cluster is a feature of the hierarchical procedure, which selects new clusters based only on their incremental optimality. We show an example of such a map, using a cut-off of 0.945, in in the Appendix.

10 It should be noted that TS1990 also has some large zones, including some in Nevada, Idaho, and California.

11 One leading alternative clustering method is spectral clustering, in which a user specifies a number of clusters desired, and the algorithm minimizes total within-cluster distance.

12 Decisions on clustering methods, clustering counts, and validation criteria depend on the application and are inherently somewhat subjective. Because clustering is an unsupervised method, there may be no indication of the ideal number of clusters (Halkidi, Batistakis, and Vazirgiannis Citation2001).

13 Other designers have made different sizing choices. For example, Employment Areas in France, also based on commuting flows, are much smaller and would be equivalent to splitting the United States into over 4,700 units (Briant, Combes, and Lafourcade Citation2010)

14 In addition, because heights are normalized in the procedure, the dissimilarities of all clusters will change relative to the cut-off.

15 To project MOEs from the ACS to 1990 flows, we calculate a ratio, MOEij/fij, representing the degree of uncertainty for a flow in the ACS. To reflect the range of MOEs across similarly sized flows, we calculate the mean and standard deviation of these ratios within flow size bins, defined by 1990 flow percentile, of: 0–50; 50–90; 90–95; 95–99; and 99+ (see Appendix Table 4). For each 1990 flow, we draw from the distribution of ratios calculated with the ACS in the corresponding bin. Note that the Census long-form is designed to be a one-in-six sample for one year, while the ACS data covers 5 years with a one-in-fifty sample each year. The smaller sample size of ACS typically results in higher margins of error for comparable statistics. The uncertainty implied by our implementation likely overstates the underlying MOEs in the 1990 flows. For more information on the construction of the ACS MOEs, see U.S. Census Bureau (Citation2014, pages 10–12).

16 In doing this, we assume that ijikk, for simplicity. In reality, it is likely that corr(ij,ik)<0, which means in our setting that we are understating the error by treating them as independent. In the Journey to Work data, there are likely some origin-destination pairs that are not reported due to the sample design. In our current resampling approach, we only resample from non-zero flows in the data. A more complete approach could model the likelihood that a zero reported is actually a positive flow, and resample accordingly, but that is beyond the scope of this paper. For more detail on the 1990 Decennial Census sample design, consult U.S. Census Bureau (Citation1992a).

17 65% of flows are not statistically different from zero and are at risk to be censored, but these tend to be small flows and account for only 1.7% of jobs.

18 To get a sense for how robust these re-sampling results are over time, we replicate the analysis reported in using the 2009–2013 ACS data. This has the additional advantage of using the MOEs specifically constructed for the data, rather than projecting them onto the 1990 commuting flows. The results are shown in Figure 13 in the Appendix. Mismatched population is actually higher, with a median of almost 5% of the population and a large right tail. The distributions of clusters and counties (panels a and b) are tighter when using the JTW 2009–2013 data, but the average number of clusters (panel a) is smaller and the mean cluster size (panel b) is higher than the values in , which reflects higher integration between counties during this time period.

19 Bound and Holzer (Citation2000), Notowidigdo (Citation2020) and Autor, Dorn, and Hanson (Citation2013) use this measure of labour demand; the latter two papers estimate the effects of demand shocks on uptake of public assistance.:

20 Observations approximately equal the product of 25 years by the number of clusters in each definition. A few observations are missing due to missingness in certain states for the early period in the BEA data.

21 We want to acknowledge that Autor, Dorn and Hanson were incredibly helpful in the process of replicating their paper, both in providing data and helping to troubleshoot, as well as being receptive to this exercise.

22 We use county-level manufacturing employment from the decennial census (Minnesota Population Center Citation2016), while Autor, Dorn, and Hanson (Citation2013) map the PUMS data at the PUMA level into the commuting zone using David Dorn’s crosswalk.

23 In addition to Foote, Kutzbach, and Vilhuber (Citation2020), the most recent code can be found at https://github.com/larsvilhuber/MobZ/github.com/larsvilhuber/MobZ/.

24 Researchers might also use an alternative flows dataset to define local labour markets. For example, hiring outcomes data give the flow of workers acceding into new jobs by geographic origin and destination. For more information, see the Job-to-Job flows data produced by the Census Bureau’s Longitudinal Employer-Household Dynamics programme at https://lehd.ces.census.gov/data/.

Additional information

Funding

This work was supported by the National Science Foundation Grant SES-1131848.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.