Abstract
This article presents a new method for optimal matching in observational studies based on mixed integer programming. Unlike widely used matching methods based on network algorithms, which attempt to achieve covariate balance by minimizing the total sum of distances between treated units and matched controls, this new method achieves covariate balance directly, either by minimizing both the total sum of distances and a weighted sum of specific measures of covariate imbalance, or by minimizing the total sum of distances while constraining the measures of imbalance to be less than or equal to certain tolerances. The inclusion of these extra terms in the objective function or the use of these additional constraints explicitly optimizes or constrains the criteria that will be used to evaluate the quality of the match. For example, the method minimizes or constrains differences in univariate moments, such as means, variances, and skewness; differences in multivariate moments, such as correlations between covariates; differences in quantiles; and differences in statistics, such as the Kolmogorov–Smirnov statistic, to minimize the differences in both location and shape of the empirical distributions of the treated units and matched controls. While balancing several of these measures, it is also possible to impose constraints for exact and near-exact matching, and fine and near-fine balance for more than one nominal covariate, whereas network algorithms can finely or near-finely balance only a single nominal covariate. From a practical standpoint, this method eliminates the guesswork involved in current optimal matching methods, and offers a controlled and systematic way of improving covariate balance by focusing the matching efforts on certain measures of covariate imbalance and their corresponding weights or tolerances. A matched case–control study of acute kidney injury after surgery among Medicare patients illustrates these features in detail. A new R package called mipmatch implements the method.
Acknowledgments
The author is greatly indebted to Paul Rosenbaum for his insightful comments and kind encouragement, which have greatly benefited this work; Alex Goldstein, Ben Hansen, Jennifer Hill, Luke Keele, Lynn Selhat, Dylan Small, and Michael Sobel for their valuable suggestions; Rachel Kelz, Caroline Reinke, and Jeffrey Silber for medical explanations; and Daniel Bienstock, Monique Guignard-Spielberg, and Peter Hahn for their help in solving mixed integer programs. This work was supported by grants R01 DK073671 from NIH–NIDDK (National Institutes of Health–National Institute of Diabetes and Digestive and Kidney Diseases) and 1R01HS018355-01 from NIH–AHRQ (National Institutes of Health–Agency for Healthcare Research and Quality).