ABSTRACT
Despite the risk of misspecification they are tied to, parametric models continue to be used in statistical practice because they are simple and convenient to use. In particular, efficient estimation procedures in parametric models are easy to describe and implement. Unfortunately, the same cannot be said of semiparametric and nonparametric models. While the latter often reflect the level of available scientific knowledge more appropriately, performing efficient inference in these models is generally challenging. The efficient influence function is a key analytic object from which the construction of asymptotically efficient estimators can potentially be streamlined. However, the theoretical derivation of the efficient influence function requires specialized knowledge and is often a difficult task, even for experts. In this article, we present a novel representation of the efficient influence function and describe a numerical procedure for approximating its evaluation. The approach generalizes the nonparametric procedures of Frangakis et al. and Luedtke, Carone, and van der Laan to arbitrary models. We present theoretical results to support our proposal and illustrate the method in the context of several semiparametric problems. The proposed approach is an important step toward automating efficient estimation in general statistical models, thereby rendering more accessible the use of realistic models in statistical analyses. Supplementary materials for this article are available online.
Supplementary Material
In the Supplementary Material, we verify that the KL divergence and Hellinger distance satisfy conditions (B1), (B2) and (B3), and thus that they are appropriate divergences for the proposed representation of the EIF. We also verify directly that the proposed representation is valid in each of the four examples discussed in Section 5. Finally, we show that the use of an inappropriate divergence can lead to violations of the proposed representation. We do so by exhibiting a particular example based on parametric models and use of the L2 norm of the difference of density functions as divergence.