ABSTRACT
In this paper we introduce %CEM, a macro package allowing researchers to automatically perform coarsened exact matching (CEM) in SAS environment. CEM is a non-parametric matching method widely used by researchers to avoid the confounding influence of pre-treatment control variables to improve causal inference in quasi-experimental studies. %CEM introduces a completely automated process which allows SAS users to efficiently perform CEM in fields in which large data sets are common and where SAS is the most popular statistical tool. In addition, such a macro may be used to test several coarsening combinations of numeric variables. This option also provides a visual representation of the matching frontier, thus enabling researchers to select the optimal setting which takes into account both the imbalance and the percentage of matched units. The paper concludes with an empirical application comparing computational performance and results obtained using alternative available software (SAS, R and STATA) using multiple administrative data sets from a large regional database.
Acknowledgments
We would like to thank Stefano Iacus, Sergio Pontello, Lei Xuan and Dan Eshleman for the helpful comments and suggestions provided.
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCiD
Paolo Berta http://orcid.org/0000-0003-0984-4288
Stefano Verzillo http://orcid.org/0000-0002-1895-8554
Notes
1. An alternative multidimensional balance measure called has recently been introduced in the literature [Citation18] with its % SAS macro code.
2. Speed performances were tested on a notebook with the following technical characteristics: OS Windows 7 (X64), an Intel(R) Core(TM) i5-2430M CPU Quad-Core processor running at 2.40 GHz with 4.00 GB of RAM. The following releases of the software under consideration were used: SAS 9.3. R 3.2.1 and STATA 13.
3. The bin width and – consequently – the number of bins is calculated by each software using the Scott's rule and according to its specific rounding approximation.
4. Execution time is highly dependent on the number of numerical variables. Indeed, in the case of the automatic coarsening option, adding one more variable means exponentially increasing the number of combinations between the standard coarsening options. For this reason, a priori knowledge of the research domain may lead the researcher to the optimal binning of its numerical variables, with major time savings when performing the matching procedure.
5. The Fortune 500 is the ranking of the 500 largest US Companies published by the Fortune magazine.