225
Views
6
CrossRef citations to date
0
Altmetric
Original Articles

Simultaneous edit-imputation and disclosure limitation for business establishment data

, &
Pages 63-82 | Received 18 May 2016, Accepted 03 Nov 2016, Published online: 15 Dec 2016
 

ABSTRACT

Business establishment microdata typically are required to satisfy agency-specified edit rules, such as balance equations and linear inequalities. Inevitably some establishments' reported data violate the edit rules. Statistical agencies correct faulty values using a process known as edit-imputation. Business establishment data also must be heavily redacted before being shared with the public; indeed, confidentiality concerns lead many agencies not to share establishment microdata as unrestricted access files. When microdata must be heavily redacted, one approach is to create synthetic data, as done in the U.S. Longitudinal Business Database and the German IAB Establishment Panel. This article presents the first implementation of a fully integrated approach to edit-imputation and data synthesis. We illustrate the approach on data from the U.S. Census of Manufactures and present a variety of evaluations of the utility of the synthetic data. The paper also presents assessments of disclosure risks for several intruder attacks. We find that the synthetic data preserve important distributional features from the post-editing confidential microdata, and have low risks for the various attacks.

CLASSIFICATION CODES:

Acknowledgments

The authors thank T. Kirk White for suggesting the regression analysis for plant-level productivity. We thank the late Lawrence Cox for suggesting the intruder scenario of the establishment with the second largest value using the synthetic data and released totals to estimate the true largest value. The research was conducted while the authors were Special Sworn Status researchers of the U.S. Census Bureau at the Center for Economic Studies. Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau or the National Science Foundation. All results have been reviewed to ensure that no confidential information is disclosed.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work was supported by the National Science Foundation [grant number SES-11-31897]; and partially supported by Charles Phelps Taft Research Center at the University of Cincinnati.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 549.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.