Full article: Inference for L-estimators of location using a bootstrap warping approach

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

In this note we propose a new semi-parametric bootstrap procedure for hypothesis tests about a statistical function and termed bootstrap warping. This procedure was motivated by empirical likelihood and bootstrap tilting techniques. The procedure is computationally efficient and has a fixed number of parameters. We show that the warping procedure has good type I error control and has monotone power as a function of sample size and shift alternatives.

Keywords:

1. Introduction

Let $X_{1}, X_{2}, \dots, X_{n}$ denote an i.i.d. sample drawn from an absolutely continuous population with cumulative distribution function (c.d.f.) denoted as F and corresponding quantile function denoted as $Q (u) = F^{- 1} (u) .$ For the application described in this note we are interested in making inferences about a one-dimensional parameter of the form $θ = T (F),$ where in our methodology T(F) denotes a specific smooth statistical functional for measuring expectation and having the form (1.1) $T (F) = E [g (X) j (F (X))] = \int g (X) j (F (X)) dF = \int_{0}^{1} g (Q (u)) j (u) du .$ (1.1)

In our applications we restrict $j (F (X))$ to be a smooth absolutely continuous weighting function such that $\int j (F (x)) dF = \int_{0}^{1} j (u) du = 1$ is essentially a p.d.f. for $j (u) > 0, \forall u \in (0, 1)$ and we assume $\int g (x) dF$ is bounded, e.g. suppose the parameter of interest is the population mean, then the statistical functional has the well-known form $θ = T (F) = E (g (X) j (F (X))) = \int x d F = \int_{0}^{1} Q (u) du,$ with $j (F (x)) = 1$ or alternatively j(u) = 1 and g(x) = x. The classic “bootstrap” estimator of T(F) is given by replacing the c.d.f. F with its empirical counterpart, $\hat{F} (x) = \sum_{i = 1}^{n} I_{(x_{i} \leq x)} / n,$ in Equation(1.1)(1.1) $T (F) = E [g (X) j (F (X))] = \int g (X) j (F (X)) dF = \int_{0}^{1} g (Q (u)) j (u) du .$ (1.1) or alternatively replacing Q with its empirical counterpart, $\hat{Q} (u) = x_{[nu] + 1 : n},$ where $[\cdot]$ denotes the floor function and $X_{i : n}$ denotes the ith order statistic. Substituting $\hat{F}$ into Equation(1.1)(1.1) $T (F) = E [g (X) j (F (X))] = \int g (X) j (F (X)) dF = \int_{0}^{1} g (Q (u)) j (u) du .$ (1.1) for F yields the empirical estimator of T(F), which has the well-known form (1.2) $T (\hat{F}) = \sum_{i = 1}^{n} g (X_{i : n}) \int_{(i - 1) / n}^{i / n} j (u) du .$ (1.2)

Some classic examples for $T (\hat{F})$ include, kernel density and quantile estimators, sample moment estimators and L-estimators, e.g. see Serfling (Citation1980) for a technical overview of estimators having this form relative to their asymptotic properties.

Now, suppose we are interested in testing a hypothesis about the given statistical functional T(F) having the form $H_{0} : T (F_{0}) = θ_{0}$ versus $H_{1} : T (F_{0}) > θ_{0},$ or without loss of generality $H_{1} : T (F_{0}) < θ_{0},$ in a semiparametric fashion. Note that we will touch on two-sided tests later in this note. Popular nonparametric approaches for testing hypotheses of this type are given by the well-known empirical likelihood (EL) method due to Owen (Citation1988) and bootstrap tilting methods such as exponential tilting or other multinomial based resampling schemes, e.g. see Davison and Hinkley (Citation1997).

The nonparametric EL and bootstrap approaches provide the motivation for our new semiparametric testing methodology. The key idea behind the EL and bootstrap tilting approaches is to find the nonparametric maximum likelihood estimator for the probability density function (p.d.f.) f₀ and c.d.f F₀ given the constraint $T (F_{0}) = θ_{0}$ prescribed under H₀, as estimated by its empirical counterpart $T ({\hat{F}}_{0}) = θ_{0},$ where ${\hat{F}}_{0} = \sum_{i = 1}^{n} v_{i} I_{(x_{i} \leq x)}$ and the v_i parameters corresponding to f₀ sum to 1 and are bounded between 0 and 1. The common definition for the v_i parameters in the continuous case for the discretized model is given as $v_{i} = F (x_{i}) - F (x_{i} -),$ where $F (x) = P (X \leq x)$ and $F (x -) = P (X < x),$ respectively, e.g. see Owen (Citation1988) for the technical argument pertaining to this definition.

In the most common scenario the likelihood under the “unconstrained” alternative hypothesis yields the classic estimates of ${\hat{v}}_{i} = 1 / n$ for simple statistics such as the sample mean. Other weights may occur for functionals corresponding to trimmed estimators, e.g. see Qin and Taso (Citation2002) with respect to the weights for trimmed mean. The weights, v_i, under the null hypothesis are generally determined by minimizing a given distance measure such as the Kullback-Leibler distance $D (v, v_{0}) = \sum_{i = 1}^{n} v_{i} log (n v_{i}),$ where the $1 \times n$ vectors $v = (1 / n, 1 / n, \dots, 1 / n)$ and $v_{0} = (v_{1}, v_{2}, \dots, v_{n}) .$ Alternatively, one may use constrained maximum likelihood approaches for determining the vector $v_{0} = (v_{1}, v_{2}, \dots, v_{n})$ under H₀, e.g. see Vexler and Gurevich (Citation2010) a typical model scenario. We use this idea of a discretized model as a starting point for developing an alternative based on smooth statistical functional inferential procedure using what we term statistical warping as defined in the next section.

In this Section 2 we outline the bootstrap warping procedure and follow this with simulation study in Section 3

2. Bootstrap warping and hypothesis testing

The key features of bootstrap warping, as contrasted with the EL and bootstrap tilting approaches, is that it is a semi-parametric approach and that the number of parameters in the model is reduced from n – 1 to 2, conditional on the observed data, i.e. we “warp” the observed e.d.f. $\hat{F} (x) = \sum_{i = 1}^{n} I_{(x_{i} \leq x)} / n$ versus treating each discretized segment on the continuum as a parameter. In addition, our resampling scheme follows the classic bootstrap multinomial resampling scheme with cell probabilities of $1 / n$ versus bootstrap tilting, which requires the weights to be determined conditional on the dataset under investigation, thus adding a layer of complexity to the computational components to these problems. The direct benefit of this parameterization is computational ease without suffering the “curse of dimensionality” associated with big data scenarios. This will be described in detail below. Additionally, in terms of future work, covariate adjustments may be made through the warping model parameters, thus extending the utility of this approach to more complex settings.

In terms of testing $H_{0} : T (F_{0}) = θ_{0}$ versus $H_{1} : T (F_{0}) \neq θ_{0}$ we need to first define $T ({\hat{F}}_{0})$ relative to obtaining an empirical version of the constraint $T (F_{0}) = θ_{0} .$ Towards this end we define the warped empirical estimator of $T (F_{0})$ based on formulation (1.1) as (2.1) $T ({\hat{F}}_{0}; α, β) = \sum_{i = 1}^{n} g (X_{i : n}) {K_{α, β} [J (\frac{i}{n})] - K_{α, β} [J (\frac{i - 1}{n})]},$ (2.1) (2.2) $= \sum_{i = 1}^{n} g (X_{i : n}) f (X_{i : n}),$ (2.2) where $J (v) = \int_{0}^{v} j (u) du, f (X_{i : n})$ is a weighting function defined more formally below at Equation(2.5)(2.5) $f_{α, β} (X_{i : n}) = F_{α, β} (X_{i : n}) - F_{α, β} (X_{i : n} -) = K_{α, β} [J (\frac{i}{n})] - K_{α, β} [J (\frac{i - 1}{n})],$ (2.5) , $j (\cdot)$ is defined at (1.1) and we define $K_{α, β}$ such that $K_{α, β} (t) = t$ under H₀. For example, if we were interested in testing about $E (g (X))$ then (2.3) $T ({\hat{F}}_{0}; α, β) = \sum_{i = 1}^{n} g (X_{i : n}) [K_{α, β} (\frac{i}{n}) - K_{α, β} (\frac{i - 1}{n})]$ (2.3) and under H₀ true we would have $T ({\hat{F}}_{0}) = \sum_{i = 1}^{n} g (X_{i : n}) / n .$

Comment. Note that $T ({\hat{F}}_{0}; α, β)$ at Equation(2.3)(2.3) $T ({\hat{F}}_{0}; α, β) = \sum_{i = 1}^{n} g (X_{i : n}) [K_{α, β} (\frac{i}{n}) - K_{α, β} (\frac{i - 1}{n})]$ (2.3) is used specifically to generate the null distribution for the estimator $T (\hat{F})$ at Equation(1.2)(1.2) $T (\hat{F}) = \sum_{i = 1}^{n} g (X_{i : n}) \int_{(i - 1) / n}^{i / n} j (u) du .$ (1.2) and is not meant to be alternative estimator for θ.

The components of the weighting function in Equation(2.2)(2.2) $= \sum_{i = 1}^{n} g (X_{i : n}) f (X_{i : n}),$ (2.2) , denoted as $K_{α, β} (\cdot),$ are defined as the c.d.f. of a Kumaraswamy distribution and given as (2.4) $K_{α, β} (u) = 1 - {(1 - u^{α})}^{β},$ (2.4) where $0 < u < 1, α > 0$ and $β > 0 .$ The choice of the Kumaraswamy distribution in terms of a weighting function is due to its numerical tractability and flexibility in terms of the relative shapes it contains, i.e. our test will be sensitive to a number of alternatives given H₁ via the choice of this weighting function.

Our semi-parametric density utilized within Equation(2.2)(2.2) $= \sum_{i = 1}^{n} g (X_{i : n}) f (X_{i : n}),$ (2.2) is now defined as discretized type model similar to what is used in the EL methodology and bootstrap tilting and given as (2.5) $f_{α, β} (X_{i : n}) = F_{α, β} (X_{i : n}) - F_{α, β} (X_{i : n} -) = K_{α, β} [J (\frac{i}{n})] - K_{α, β} [J (\frac{i - 1}{n})],$ (2.5) where $F (x) = P (X \leq x)$ and $F (x -) = P (X < x),$ and $f (X_{i : n})$ represents a point mass corresponding to the ith order statistic. The Kumaraswamy distribution was chosen over other candidate distributions, e.g. the beta distribution, due to its well-behaved numerical properties and relatively straightforward parameterization. See Jones (Citation2009) for a detailed description of the Kumaraswamy distribution and a description of its close relationship to the beta distribution. In essence $f (X_{i : n})$ serves as a standard weighting function such that when α = 1 and β = 1 then $T (\hat{F}; α, β)$ equates to $T ({\hat{F}}_{0}; α, β) .$

The test of interest in this note is given as $H_{0} : T (F_{0}) = θ_{0}$ versus $H_{1} : T (F_{0}) > θ_{0} .$ As in EL methods and bootstrap tilting the first step is to maximize the constrained pseudo-likelihood (2.6) $L_{α, β} = \prod_{i = 1}^{n} f_{α, β} (X_{i : n}),$ (2.6) with respect to α and β and under the constraint $H_{0} : T (F_{0}) = θ_{0},$ where $f (X_{i : n})$ is defined at Equation(2.5)(2.5) $f_{α, β} (X_{i : n}) = F_{α, β} (X_{i : n}) - F_{α, β} (X_{i : n} -) = K_{α, β} [J (\frac{i}{n})] - K_{α, β} [J (\frac{i - 1}{n})],$ (2.5) . Clearly, α = 1 and β = 1 given H₀ is true.

The bootstrap resampling scheme for our inferential method then is as follows:

Calculate the observed test statistic $T (\hat{F}) .$
Obtain $\hat{α}$ and $\hat{β}$ from Equation(2.6)(2.6) $L_{α, β} = \prod_{i = 1}^{n} f_{α, β} (X_{i : n}),$ (2.6) .
Generate B nonparametric bootstrap samples of size n, i.e. generate n uniform (0,1) random variables and apply $\hat{Q} (u) = x_{[nu] + 1 : n}$ to those randomly generated uniform variates.
Calculate $T_{i}^{*} ({\hat{F}}_{0} | \hat{α}, \hat{β})$ from Equation(2.2)(2.2) $= \sum_{i = 1}^{n} g (X_{i : n}) f (X_{i : n}),$ (2.2) , replacing α with $\hat{α}$ and β with $\hat{β}$ in step (2), for $i = 1, 2, \dots, B .$
Calculate the approximate one-sided bootstrap p-value $p_{boot} = \sum_{j = 1}^{B} I (T_{i}^{*} ({\hat{F}}_{0} | \hat{α}, \hat{β}) > T (\hat{F})) / B,$ where $T (\hat{F})$ is the observed estimator defined at Equation(1.2)(1.2) $T (\hat{F}) = \sum_{i = 1}^{n} g (X_{i : n}) \int_{(i - 1) / n}^{i / n} j (u) du .$ (1.2) .

For a test of $H_{0} : T (F_{0}) = θ_{0}$ versus $H_{1} : T (F_{0}) < θ_{0}$ simply reverse the inequality in step (5) above. For the test $H_{0} : T (F_{0}) = θ_{0}$ versus $H_{1} : T (F_{0}) \neq θ_{0}$ there is an added assumption of the symmetry of the distribution of $T_{i}^{*} ({\hat{F}}_{0} | \hat{α}, \hat{β})$ under H₀. Under this assumption the two-sided p-value is given as $p_{boot} = \sum_{j = 1}^{B} I (| T_{i}^{*} ({\hat{F}}_{0} | \hat{α}, \hat{β}) | > | T (\hat{F}) |) / B .$ In general, for most of the tests of interest the test statistic will have an asymptotic normal distribution thus most two-sided tests should satisfy approximately a symmetry assumption, i.e. the statistics are based on smooth functions, which then in turn lend themselves to well-behaved and symmetric bootstrap resampling distributions.

As with similar bootstrapping methodologies for inference. e.g. see Davison and Hinkley (Citation1997), the key is that $T ({\hat{F}}_{0}; α, β)$ is a consistent estimator of $T (F_{0})$ under H₀, which in the methodology presented above holds given α = 1 and β = 1 under H₀, $\hat{F}$ converges to F and $T (\hat{F}; α, β)$ converges to T(F) given the smoothness conditions outlined earlier, which is by definition the statistical function of interest, e.g. see van der Vaart (Citation1998). The large sample proof of this concept is given by the following theorem:

Theorem 1.

Under $H_{0} : T (F_{0}) = θ_{0}$ and as $n \to \infty,$ $\sqrt{n} (\hat{α} - 1, \hat{β} - 1)$ has a centered bivariate normal distribution with variance-covariance matrix $B^{- 1} Σ B^{- 1}$ , where B is the standard maximum likelihood based information matrix associated with the Kumaraswamy density, $k_{α, β} = d K_{α, β}$ , and Σ is the variance-covariance matrix of a 2-dimensional random vector whose components are given by (2.7) $\partial log f_{α, β} (X_{i : n}) / \partial α + W_{α} (X_{i : n}),$ (2.7) (2.8) (2.9) $\partial log f_{α, β} (X_{i : n}) / \partial β + W_{β} (X_{i : n}),$ (2.8) (2.9) where (2.10) $W_{α} (X_{i : n}) = \int I_{(F_{α, β} (X_{i : n}) < u)} \frac{\partial^{2}}{\partial α \partial u} log (f_{α, β} (u)) d F_{α, β} (u),$ (2.10) (2.11) $W_{β} (X_{i : n}) = \int I_{(F_{α, β} (X_{i : n}) < u)} \frac{\partial^{2}}{\partial β \partial u} log (f_{α, β} (u)) d F_{α, β} (u) .$ (2.11)

Proof.

The technical details have been worked out in an elegant fashion for the case of a semi-parametric copula model with marginal distribution functions estimated by the empirical distribution function estimator. The result in Theorem 1 follows directly from the theoretical developments used in the copula approach in Section 4 of the copula paper Genest, Ghoudi, and Rivest (Citation1995) by simply replacing the multivariate copula function with the univariate beta density, which is essentially a special case of the higher dimension copula model. Estimates of the variance-covariance matrix $B^{- 1} Σ B^{- 1}$ are not as straightforward to obtain and we recommend bootstrap resampling for this purpose.

3. Simulation results

For our simulation study we focused on the trimmed mean with known statistical functional given as $T_{γ} (F) = \frac{1}{1 - 2 γ} \int_{Q (γ)}^{Q (1 - γ)} x d F = \frac{1}{1 - 2 γ} \int_{γ}^{1 - γ} Q (u) du,$ $0 < γ < 1 / 2 .$ Similar results in terms of behavior hold for moment estimators and kernel estimators and are not presented here. We centered our simulation study for the trimmed mean on the hypothesis test $H_{0} : T (F_{0}) = 0$ versus $H_{1} : T (F_{0}) > 0$ at type I error rate 0.05 for symmetric distributions with trimming proportions $α = 0, 0.1, 0.2$ and samples of size n = 10, 20, 50. For each simulation result we utilized 1000 Monte Carlo resamples with the number of bootstrap resamples set to B = 250. For the exponential distribution we tested $H_{0} : T (F_{0}) = T_{γ} (F)$ versus $H_{1} : T (F_{0}) > T_{γ} (F)$ with shifted exponential alternatives. For the γ trimmed mean we can simplify (2.3) such that (3.1) $T ({\hat{F}}_{0}; α, β) = \sum_{i = k + 1}^{n - k} g (X_{i : n}) K_{α, β} (\frac{i}{n - 2 k}) - K_{α, β} (\frac{i - 1}{n - 2 k}),$ (3.1) where $k = [n τ] .$ For power examinations we used shift alternatives of $δ = 0.5$ and δ = 1 different from H₀. The Type I error was estimated from the 1000 Monte Carlo resamples as the number of times p_boot was less than the Type error of 0.05.

The results of our simulation study are presented in . We see that the type I error is controlled at the nominal level and that fluctuations about that level are primarily due to simulation error. The power is monotone in increase δ and n. As compared with an optimal scenario such as a t-test under normality with $δ = 0.5$ and $α = 0.0$ the one-sample t-test has power of 0.427, 0.695 and 0.967 as compared to the warping powers of 0.369, 0.656 and 0.966 for samples of size n of 10, 20 and 50, which yields relative efficiencies of 86.4%, 94.4% and 99.8%,respectively. By comparison the empirical likelihood approach yields powers of 0.414, 0.640 and 0.960 for samples of size n of 10, 20 and 50 under the same scenarios. However, it should be noted that the Type I error control for the empirical likelihood approach was inflated at 0.084 at $δ 0$ and n = 10, hence in turn the power value of 0.414 is inflated at this same sample size due to a much higher than desired Type I error level as compared to the warping approach.

Table 1. Type 1 error set to 0.05(δ = 0), power ( $δ > 0$ ) at trimming proportions $α = 0, 0.1$ and 0.2.

Display Table

Acknowledgments

This work was supported by Roswell Park Cancer Institute and National Cancer Institute (NCI) grant P30CA016056, NRG Oncology Statistical and Data Management Center grant U10CA180822 and IOTN Moonshot grant U24CA232979-01. We wish to thank the reviewers for their thoughtful comments, which led to an improved version of this work.

References

Davison, A. C., and D. V. Hinkley. 1997. Bootstrap methods and their applications. Cambridge series in statistical and probabilistic mathematics. New York, NY: Cambridge University Press.
Google Scholar
Genest, C., K. Ghoudi, and L. P. Rivest. 1995. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 82 (3):543–52. doi:https://doi.org/10.2307/2337532.
Web of Science ®Google Scholar
Jones, M. C. 2009. Kumaraswamy’s distribution: A beta-type distribution with some tractability advantages. Statistical Methodology 6 (1):70–81. doi:https://doi.org/10.1016/j.stamet.2008.04.001.
Google Scholar
Owen, A. B. 1988. Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75 (2):237–49. doi:https://doi.org/10.2307/2336172.
Web of Science ®Google Scholar
Qin, G., and M. Tsao. 2002. Empirical likelihood ratio confidence intervals for the trimmed mean. Communications in Statistics – Theory and Methods 31 (12):2197–208. doi:https://doi.org/10.1081/STA-120017221.
Web of Science ®Google Scholar
Serfling, R. J. 1980. Approximation theorems of mathematical statistics. New York, NY: John Wiley & Sons.
Google Scholar
van der Vaart, 1998. Asymptotic statistics. Cambridge series in statistical and probabilistic mathematics. Cambridge, UK: Cambridge University Press.
Google Scholar
Vexler, A., and G. Gurevich. 2010. Empirical likelihood ratios applied to goodness-of-fit tests based on sample entropy. Computational Statistics and Data Analysis 54 (2):531–45. doi:https://doi.org/10.1016/j.csda.2009.09.025.
Web of Science ®Google Scholar

Inference for L-estimators of location using a bootstrap warping approach

Abstract

1. Introduction

2. Bootstrap warping and hypothesis testing

3. Simulation results

Table 1. Type 1 error set to 0.05(δ = 0), power ( $δ > 0$ ) at trimming proportions $α = 0, 0.1$ and 0.2.

Acknowledgments

References

Information for

Open access

Opportunities

Help and information

Inference for L-estimators of location using a bootstrap warping approach

Abstract

1. Introduction

2. Bootstrap warping and hypothesis testing

3. Simulation results

Table 1. Type 1 error set to 0.05(δ = 0), power (δ>0) at trimming proportions α=0,0.1 and 0.2.

Acknowledgments

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 1. Type 1 error set to 0.05(δ = 0), power ( $δ > 0$ ) at trimming proportions $α = 0, 0.1$ and 0.2.