ABSTRACT
We study estimation and inference when there are multiple values (“matches”) for the explanatory variables and only one of the matches is the correct one. This problem arises often when two datasets are linked together on the basis of information that does not uniquely identify regressor values. We offer a set of two intuitive conditions that ensure consistent inference using the average of the possible matches in a linear framework. The first condition is the exogeneity of the false match with respect to the regression error. The second condition is a notion of exchangeability between the true and false matches. Conditioning on the observed data, the probability that each match is correct is completely unrestricted. We perform a Monte Carlo study to investigate the estimator’s finite-sample performance relative to others proposed in the literature. Finally, we provide an empirical example revisiting a main area of application: the measurement of intergenerational elasticities in income. Supplementary materials for this article are available online.
ACKNOWLEDGMENTS
Joe Ferrie is the one who sparked thinking about this particular setup. The authors thank the associate editor, two anonymous referees, Antonio Galvao, Paul Grieco, Adriana Lleras-Muney, and Chris Vickers for providing useful feedback. The authors also thank James Feigenbaum for answering some questions about his article as well as providing data and code. Earlier versions of this article were previously circulated under the title “A Simple Estimator for Datasets with Nonunique Identifiers.”