294
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

Inference for L-estimators of location using a bootstrap warping approach

Pages 2145-2150 | Received 04 Dec 2018, Accepted 13 Mar 2019, Published online: 03 Apr 2019

Abstract

In this note we propose a new semi-parametric bootstrap procedure for hypothesis tests about a statistical function and termed bootstrap warping. This procedure was motivated by empirical likelihood and bootstrap tilting techniques. The procedure is computationally efficient and has a fixed number of parameters. We show that the warping procedure has good type I error control and has monotone power as a function of sample size and shift alternatives.

1. Introduction

Let X1,X2,,Xn denote an i.i.d. sample drawn from an absolutely continuous population with cumulative distribution function (c.d.f.) denoted as F and corresponding quantile function denoted as Q(u)=F1(u). For the application described in this note we are interested in making inferences about a one-dimensional parameter of the form θ=T(F), where in our methodology T(F) denotes a specific smooth statistical functional for measuring expectation and having the form (1.1) T(F)=E[g(X)j(F(X))]=g(X)j(F(X))dF=01g(Q(u))j(u)du.(1.1)

In our applications we restrict j(F(X)) to be a smooth absolutely continuous weighting function such that j(F(x))dF=01j(u)du=1 is essentially a p.d.f. for j(u)>0,u(0,1) and we assume g(x)dF is bounded, e.g. suppose the parameter of interest is the population mean, then the statistical functional has the well-known form θ=T(F)=E(g(X)j(F(X)))=xdF=01Q(u)du, with j(F(x))=1 or alternatively j(u) = 1 and g(x) = x. The classic “bootstrap” estimator of T(F) is given by replacing the c.d.f. F with its empirical counterpart, F̂(x)=i=1nI(xix)/n, in Equation(1.1) or alternatively replacing Q with its empirical counterpart, Q̂(u)=x[nu]+1:n, where [·] denotes the floor function and Xi:n denotes the ith order statistic. Substituting F̂ into Equation(1.1) for F yields the empirical estimator of T(F), which has the well-known form (1.2) T(F̂)=i=1ng(Xi:n)(i1)/ni/nj(u)du.(1.2)

Some classic examples for T(F̂) include, kernel density and quantile estimators, sample moment estimators and L-estimators, e.g. see Serfling (Citation1980) for a technical overview of estimators having this form relative to their asymptotic properties.

Now, suppose we are interested in testing a hypothesis about the given statistical functional T(F) having the form H0:T(F0)=θ0 versus H1:T(F0)>θ0, or without loss of generality H1:T(F0)<θ0, in a semiparametric fashion. Note that we will touch on two-sided tests later in this note. Popular nonparametric approaches for testing hypotheses of this type are given by the well-known empirical likelihood (EL) method due to Owen (Citation1988) and bootstrap tilting methods such as exponential tilting or other multinomial based resampling schemes, e.g. see Davison and Hinkley (Citation1997).

The nonparametric EL and bootstrap approaches provide the motivation for our new semiparametric testing methodology. The key idea behind the EL and bootstrap tilting approaches is to find the nonparametric maximum likelihood estimator for the probability density function (p.d.f.) f0 and c.d.f F0 given the constraint T(F0)=θ0 prescribed under H0, as estimated by its empirical counterpart T(F̂0)=θ0, where F̂0=i=1nviI(xix) and the vi parameters corresponding to f0 sum to 1 and are bounded between 0 and 1. The common definition for the vi parameters in the continuous case for the discretized model is given as vi=F(xi)F(xi), where F(x)=P(Xx) and F(x)=P(X<x), respectively, e.g. see Owen (Citation1988) for the technical argument pertaining to this definition.

In the most common scenario the likelihood under the “unconstrained” alternative hypothesis yields the classic estimates of v̂i=1/n for simple statistics such as the sample mean. Other weights may occur for functionals corresponding to trimmed estimators, e.g. see Qin and Taso (Citation2002) with respect to the weights for trimmed mean. The weights, vi, under the null hypothesis are generally determined by minimizing a given distance measure such as the Kullback-Leibler distance D(v,v0)=i=1nvilog(nvi), where the 1×n vectors v=(1/n,1/n,,1/n) and v0=(v1,v2,,vn). Alternatively, one may use constrained maximum likelihood approaches for determining the vector v0=(v1,v2,,vn) under H0, e.g. see Vexler and Gurevich (Citation2010) a typical model scenario. We use this idea of a discretized model as a starting point for developing an alternative based on smooth statistical functional inferential procedure using what we term statistical warping as defined in the next section.

In this Section 2 we outline the bootstrap warping procedure and follow this with simulation study in Section 3

2. Bootstrap warping and hypothesis testing

The key features of bootstrap warping, as contrasted with the EL and bootstrap tilting approaches, is that it is a semi-parametric approach and that the number of parameters in the model is reduced from n – 1 to 2, conditional on the observed data, i.e. we “warp” the observed e.d.f. F̂(x)=i=1nI(xix)/n versus treating each discretized segment on the continuum as a parameter. In addition, our resampling scheme follows the classic bootstrap multinomial resampling scheme with cell probabilities of 1/n versus bootstrap tilting, which requires the weights to be determined conditional on the dataset under investigation, thus adding a layer of complexity to the computational components to these problems. The direct benefit of this parameterization is computational ease without suffering the “curse of dimensionality” associated with big data scenarios. This will be described in detail below. Additionally, in terms of future work, covariate adjustments may be made through the warping model parameters, thus extending the utility of this approach to more complex settings.

In terms of testing H0:T(F0)=θ0 versus H1:T(F0)θ0 we need to first define T(F̂0) relative to obtaining an empirical version of the constraint T(F0)=θ0. Towards this end we define the warped empirical estimator of T(F0) based on formulation (1.1) as (2.1) T(F̂0;α,β)=i=1ng(Xi:n){Kα,β[J(in)]Kα,β[J(i1n)]},(2.1) (2.2) =i=1ng(Xi:n)f(Xi:n),(2.2) where J(v)=0vj(u)du,f(Xi:n) is a weighting function defined more formally below at Equation(2.5), j(·) is defined at (1.1) and we define Kα,β such that Kα,β(t)=t under H0. For example, if we were interested in testing about E(g(X)) then (2.3) T(F̂0;α,β)=i=1ng(Xi:n)[Kα,β(in)Kα,β(i1n)](2.3) and under H0 true we would have T(F̂0)=i=1ng(Xi:n)/n.

Comment. Note that T(F̂0;α,β) at Equation(2.3) is used specifically to generate the null distribution for the estimator T(F̂) at Equation(1.2) and is not meant to be alternative estimator for θ.

The components of the weighting function in Equation(2.2), denoted as Kα,β(·), are defined as the c.d.f. of a Kumaraswamy distribution and given as (2.4) Kα,β(u)=1(1uα)β,(2.4) where 0<u<1,α>0 and β>0. The choice of the Kumaraswamy distribution in terms of a weighting function is due to its numerical tractability and flexibility in terms of the relative shapes it contains, i.e. our test will be sensitive to a number of alternatives given H1 via the choice of this weighting function.

Our semi-parametric density utilized within Equation(2.2) is now defined as discretized type model similar to what is used in the EL methodology and bootstrap tilting and given as (2.5) fα,β(Xi:n)=Fα,β(Xi:n)Fα,β(Xi:n)=Kα,β[J(in)]Kα,β[J(i1n)],(2.5) where F(x)=P(Xx) and F(x)=P(X<x), and f(Xi:n) represents a point mass corresponding to the ith order statistic. The Kumaraswamy distribution was chosen over other candidate distributions, e.g. the beta distribution, due to its well-behaved numerical properties and relatively straightforward parameterization. See Jones (Citation2009) for a detailed description of the Kumaraswamy distribution and a description of its close relationship to the beta distribution. In essence f(Xi:n) serves as a standard weighting function such that when α = 1 and β = 1 then T(F̂;α,β) equates to T(F̂0;α,β).

The test of interest in this note is given as H0:T(F0)=θ0 versus H1:T(F0)>θ0. As in EL methods and bootstrap tilting the first step is to maximize the constrained pseudo-likelihood (2.6) Lα,β=i=1nfα,β(Xi:n),(2.6) with respect to α and β and under the constraint H0:T(F0)=θ0, where f(Xi:n) is defined at Equation(2.5). Clearly, α = 1 and β = 1 given H0 is true.

The bootstrap resampling scheme for our inferential method then is as follows:

  1. Calculate the observed test statistic T(F̂).

  2. Obtain α̂ and β̂ from Equation(2.6).

  3. Generate B nonparametric bootstrap samples of size n, i.e. generate n uniform (0,1) random variables and apply Q̂(u)=x[nu]+1:n to those randomly generated uniform variates.

  4. Calculate Ti*(F̂0|α̂,β̂) from Equation(2.2), replacing α with α̂ and β with β̂ in step (2), for i=1,2,,B.

  5. Calculate the approximate one-sided bootstrap p-value pboot=j=1BI(Ti*(F̂0|α̂,β̂)>T(F̂))/B, where T(F̂) is the observed estimator defined at Equation(1.2).

For a test of H0:T(F0)=θ0 versus H1:T(F0)<θ0 simply reverse the inequality in step (5) above. For the test H0:T(F0)=θ0 versus H1:T(F0)θ0 there is an added assumption of the symmetry of the distribution of Ti*(F̂0|α̂,β̂) under H0. Under this assumption the two-sided p-value is given as pboot=j=1BI(|Ti*(F̂0|α̂,β̂)|>|T(F̂)|)/B. In general, for most of the tests of interest the test statistic will have an asymptotic normal distribution thus most two-sided tests should satisfy approximately a symmetry assumption, i.e. the statistics are based on smooth functions, which then in turn lend themselves to well-behaved and symmetric bootstrap resampling distributions.

As with similar bootstrapping methodologies for inference. e.g. see Davison and Hinkley (Citation1997), the key is that T(F̂0;α,β) is a consistent estimator of T(F0) under H0, which in the methodology presented above holds given α = 1 and β = 1 under H0, F̂ converges to F and T(F̂;α,β) converges to T(F) given the smoothness conditions outlined earlier, which is by definition the statistical function of interest, e.g. see van der Vaart (Citation1998). The large sample proof of this concept is given by the following theorem:

Theorem 1.

Under H0:T(F0)=θ0 and as n, n(α̂1,β̂1) has a centered bivariate normal distribution with variance-covariance matrix B1ΣB1, where B is the standard maximum likelihood based information matrix associated with the Kumaraswamy density, kα,β=dKα,β, and Σ is the variance-covariance matrix of a 2-dimensional random vector whose components are given by (2.7) logfα,β(Xi:n)/α+Wα(Xi:n),(2.7) (2.8) (2.9) logfα,β(Xi:n)/β+Wβ(Xi:n),(2.8) (2.9) where (2.10) Wα(Xi:n)=I(Fα,β(Xi:n)<u)2αulog(fα,β(u))dFα,β(u),(2.10) (2.11) Wβ(Xi:n)=I(Fα,β(Xi:n)<u)2βulog(fα,β(u))dFα,β(u).(2.11)

Proof.

The technical details have been worked out in an elegant fashion for the case of a semi-parametric copula model with marginal distribution functions estimated by the empirical distribution function estimator. The result in Theorem 1 follows directly from the theoretical developments used in the copula approach in Section 4 of the copula paper Genest, Ghoudi, and Rivest (Citation1995) by simply replacing the multivariate copula function with the univariate beta density, which is essentially a special case of the higher dimension copula model. Estimates of the variance-covariance matrix B1ΣB1 are not as straightforward to obtain and we recommend bootstrap resampling for this purpose.

3. Simulation results

For our simulation study we focused on the trimmed mean with known statistical functional given as Tγ(F)=112γQ(γ)Q(1γ)xdF=112γγ1γQ(u)du, 0<γ<1/2. Similar results in terms of behavior hold for moment estimators and kernel estimators and are not presented here. We centered our simulation study for the trimmed mean on the hypothesis test H0:T(F0)=0 versus H1:T(F0)>0 at type I error rate 0.05 for symmetric distributions with trimming proportions α=0,0.1,0.2 and samples of size n = 10, 20, 50. For each simulation result we utilized 1000 Monte Carlo resamples with the number of bootstrap resamples set to B = 250. For the exponential distribution we tested H0:T(F0)=Tγ(F) versus H1:T(F0)>Tγ(F) with shifted exponential alternatives. For the γ trimmed mean we can simplify (2.3) such that (3.1) T(F̂0;α,β)=i=k+1nkg(Xi:n)Kα,β(in2k)Kα,β(i1n2k),(3.1) where k=[nτ]. For power examinations we used shift alternatives of δ=0.5 and δ = 1 different from H0. The Type I error was estimated from the 1000 Monte Carlo resamples as the number of times pboot was less than the Type error of 0.05.

The results of our simulation study are presented in . We see that the type I error is controlled at the nominal level and that fluctuations about that level are primarily due to simulation error. The power is monotone in increase δ and n. As compared with an optimal scenario such as a t-test under normality with δ=0.5 and α=0.0 the one-sample t-test has power of 0.427, 0.695 and 0.967 as compared to the warping powers of 0.369, 0.656 and 0.966 for samples of size n of 10, 20 and 50, which yields relative efficiencies of 86.4%, 94.4% and 99.8%,respectively. By comparison the empirical likelihood approach yields powers of 0.414, 0.640 and 0.960 for samples of size n of 10, 20 and 50 under the same scenarios. However, it should be noted that the Type I error control for the empirical likelihood approach was inflated at 0.084 at δ0 and n = 10, hence in turn the power value of 0.414 is inflated at this same sample size due to a much higher than desired Type I error level as compared to the warping approach.

Table 1. Type 1 error set to 0.05(δ = 0), power (δ>0) at trimming proportions α=0,0.1 and 0.2.

Acknowledgments

This work was supported by Roswell Park Cancer Institute and National Cancer Institute (NCI) grant P30CA016056, NRG Oncology Statistical and Data Management Center grant U10CA180822 and IOTN Moonshot grant U24CA232979-01. We wish to thank the reviewers for their thoughtful comments, which led to an improved version of this work.

References