225
Views
6
CrossRef citations to date
0
Altmetric
Original Articles

Simultaneous edit-imputation and disclosure limitation for business establishment data

, &
Pages 63-82 | Received 18 May 2016, Accepted 03 Nov 2016, Published online: 15 Dec 2016
 

ABSTRACT

Business establishment microdata typically are required to satisfy agency-specified edit rules, such as balance equations and linear inequalities. Inevitably some establishments' reported data violate the edit rules. Statistical agencies correct faulty values using a process known as edit-imputation. Business establishment data also must be heavily redacted before being shared with the public; indeed, confidentiality concerns lead many agencies not to share establishment microdata as unrestricted access files. When microdata must be heavily redacted, one approach is to create synthetic data, as done in the U.S. Longitudinal Business Database and the German IAB Establishment Panel. This article presents the first implementation of a fully integrated approach to edit-imputation and data synthesis. We illustrate the approach on data from the U.S. Census of Manufactures and present a variety of evaluations of the utility of the synthetic data. The paper also presents assessments of disclosure risks for several intruder attacks. We find that the synthetic data preserve important distributional features from the post-editing confidential microdata, and have low risks for the various attacks.

CLASSIFICATION CODES:

Acknowledgments

The authors thank T. Kirk White for suggesting the regression analysis for plant-level productivity. We thank the late Lawrence Cox for suggesting the intruder scenario of the establishment with the second largest value using the synthetic data and released totals to estimate the true largest value. The research was conducted while the authors were Special Sworn Status researchers of the U.S. Census Bureau at the Center for Economic Studies. Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau or the National Science Foundation. All results have been reviewed to ensure that no confidential information is disclosed.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work was supported by the National Science Foundation [grant number SES-11-31897]; and partially supported by Charles Phelps Taft Research Center at the University of Cincinnati.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.