131
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Comparative study on excess distribution estimation in iid settings

ORCID Icon
Received 07 Oct 2023, Accepted 18 May 2024, Published online: 07 Jun 2024

Abstract.

This study considers excess distribution estimation in iid settings. There are two ways for the estimation; the fitting to the generalized Pareto distribution and the fully non parametric estimation. The fitting estimator is justified by the approximation proven in the extreme value theory; however, the accuracy depends on how extremely large the target is. The non parametric estimator does not need an approximation and has the advantage of wide applicability. This study conducts both theoretical and numerical comparative study on excess distribution estimation. Asymptotic convergence rates of two estimators are obtained, and the mean integrated squared errors are numerically surveyed by simulation study. An illustrative example of Abisko rainfall amount is presented.

Mathematics Subject Classification (2010):

1. Introduction

Let X1,X2,Xn be independent and identically distributed random variables with a continuous distribution function F. Suppose that n is sufficiently large. Here, we consider estimating the excess distribution (ED) given by Fu(x):=P(X1ux|X1>u)=F(x+u)F(u)1F(u).

Shimokihara and Maesono (Citation2018) studied asymptotic properties of a non parametric estimator. The non parametric estimator (NE) is the plug-in type of the kernel distribution estimator F̂u(x):=F̂(x+u)F̂(u)1F̂(u), where F̂(x) is the kernel distribution estimator given by F̂(x)=1ni=1nW(xXih), where W is the cumulative distribution function of a symmetric density w. The bandwidth h is supposed to satisfy h→0.

Under some regularity conditions, the asymptotic mean squared error (MSE) of NE F̂u(x) asymptotically equals h44(11F(u))2(f(x+u)f(u)+Fu(x)f(u))2(z2w(z)dz)2+n1Fu(x)1F(x+u)(1F(u))2

(see Theorem 1.1 in Shimokihara and Maesono Citation2018), where the integral range being (,) is omitted in this paper. Both x and u are implicitly assumed to be fixed in Shimokihara and Maesono (Citation2018).

If we want to know the ED Fu(x) on a tail, the Pickands-Balkema-De Haan theorem in the extreme value theory is applicable. The theorem states that Fu(x) converges to the generalized Pareto distribution (GPD) Hγ, c(x) as ux:=sup(supp(f)), where Hγ,c(x):=Hγ(xc)for1+γxc>0,Hγ(x):=1{(1+γx)1/γfor1+γx>0andγR{0}exp(x)forxRandγ=0.

Thus, parametrically fitting GPD to ED is justified, and Hγ̂u(x):=Hγ̂(xĉu)for1+γ̂xĉu>0 provides good estimates for a sufficiently large u. We will call the parametric estimator PE.

In short, there are mainly two ways to estimate ED. NE F̂u is supposed to be used for fixed, that is, not large u. On the other hand, the extreme value approach requires large u. Then, the following question arise: How large u should be for the extreme value approach ?. Preceding researches also states “How far can we extrapolate into the tails?” (Smith Citation1987, p.1194), “That is, an approximation to probabilities of extreme deviation is supposed, which is assumed to become increasingly accurate as one moves further from the range of the data, but whose concise accuracy is unknown” (in the abstract of Hall and Weissman Citation1997). Smith (Citation1987) gave a response in terms of the convergence rate, which is a function of x (Remark of Theorem 8.1).

This study aims at clarifying how large the fitting estimator requires on u by comparing the two ways of ED estimation. Moriyama (Citation2021) conducted a comparative study on the estimation of sample maximum distribution Fm between the extreme-value-based approach and non parametric approach and investigated both theoretical and numerical accuracy, depending on m. Estimators of extreme quantiles are numerically compared in Banfi, Cazzaniga, and De Michele (Citation2022).

To the best of our knowledge, this is the first comparative study between the extreme-value-based approach and the non parametric approach in the distribution tail. This study assumes the tail of the underlying distribution to obtain the explicit form of asymptotic errors. Throughout this study, suppose that F belongs to either one of (i) the so-called Hall class of distributions (see Hall and Welsh Citation1984), (ii) the following Weibull class of distributions, and (iii) the bounded class of distributions (see, e.g., Stupfler Citation2016), which satisfy (i) (α,β,A,B) s.t. α>0,β21,A>0,B0 and xα+β{1F(x)Axα(1+Bxβ)}0asx,

(ii) (κ,C) s.t. κ>0,C>0 and exp(Cxκ){1F(x)exp(Cxκ)}0asx,

(iii) (x,μ,σ,D,E) s.t. xR, μ<2, σ21, D > 0, E≠0 and (xx)μ+σ{1F(x)(xx)μ(D+E(xx)σ)}0asxx, respectively. Then, the limiting GPD is a Fréchet, Gumbel, and Weibull type under the supposition (see Beirlant et al. Citation2004), where γ:={α1fortheHallclass0fortheWeibullclassμ1fortheboundedclass,cu:={γufortheHallclassC1κ1u1κfortheWeibullclassγ(xu)fortheboundedclass. under κ≤1.

Section 2 and 3 give the asymptotic properties of NE, the Kernel-type estimator, and PE, the fitting estimator to GPD, respectively, under the supposition x:=xnx and u:=unx as n→∞, where x: = ∞ for the Hall class or the Weibull class and x: = 0 for the bounded class. Results of the numerically comparative study are shown in Section 4, and the asymptotic convergence rates of the two estimators are provided in some cases. The proofs of theoretical results are in Appendix.

2. Kernel-type estimation

The following theorem on the MSE of NE is a consequence of Theorem 1.1 in Shimokihara and Maesono (Citation2018), where all asymptotic notations in this article refer to n→∞.

Theorem 1.

Suppose F is continuously twice differentiable at x. If z2w(z)dz< and {hxκ10fortheWeibullclassh(xx)10andh(xx)10fortheboundedclass, Fu2(x)E[(Fu(x)F̂u(x))2](Unh2ξn2z2w(z)dz)2+Un(1Un)vnn,

where Un:={(1+xu)αfortheHallclassexp(C{(x+u)κuκ})fortheWeibullclass(1xxu)μfortheboundedclass,ξn:={α(α+1)u2{(xu+1)2+12(xu+1)α}fortheHallclassκ2C2{(x+u)2κ2u2κ2}fortheWeibullclassμ(μ+1)(xu)2×{(1xxu)2+12(1xxu)μ}fortheboundedclass,vn:={A1uαfortheHallclassexp(Cuκ)fortheWeibullclassD1(xu)μfortheboundedclass.

The following Corollary 1 on the bandwidth minimizing the MSE follows from Theorem 1.

Corollary 1.

Suppose |zW(z)w(z)dz|< and Un2ξn2ωnn10, where ωn:={A1αuα1((1Un)2+Un1+γ)fortheHallclassκCexp(Cuκ)(uκ1(1Un)2+(x+u)κ1Un)fortheWeibullclassD1μ(xu)μ1((1Un)2+Un1+γ)fortheboundedclass.

Under the assumptions of Theorem 1, the optimal bandwidth in the sense of the MSE is h=(2Un2ξn2ωnn1zW(z)w(z)dz(z2w(z)dz)2)1/3.

Fu(x)F̂u(x) with the optimal bandwidth is asymptotically non degenerate normal with the asymptotic mean ν0n2/3Un1/3ξn1/3ωn2/3,

where ν0:=(2z2w(z)dz)1/3(zW(z)w(z)dz)2/3.

The following Corollary 2 states the special case Un = O(1) of Corollary 1.

Corollary 2.

Suppose δ>0 s.t. Unδ. Under the assumptions of Corollary 1, the asymptotically optimal bandwidth in the sense of the MSE is h=(2δ2{(1δ)2+δ1+γ}n1zW(z)w(z)dz(z2w(z)dz)2)1/3×{A1/3α1/3(α+1)2/3u(α+3)/3(δ2γ+12δ)2/3fortheHallclassκ1C1/3exp(Cuκ/3)(lnδ)2/3u1(κ/3)fortheWeibullclassD1/3μ1/3(μ+1)2/3(xu)(μ+3)/3(δ2γ+12δ)2/3fortheboundedclass.

F̂u(x) with the optimal bandwidth has the asymptotic bias ν0δ1/3((1δ)2+δ1+γ)2/3n2/3×{A2/3α1/3(α+1)1/3u2α/3(δ2γ+12δ)1/3fortheHallclassC1/3(lnδ)1/3exp(2Cuκ/3)uκ/3fortheWeibullclassD2/3μ1/3(μ+1)1/3(xu)2μ/3(δ2γ+12δ)1/3fortheboundedclass.

The twice differentiability required in Theorem 1 is the usual regularity condition in smooth distribution estimation or density estimation (see, e.g., Wand and Jones Citation1995).

Remark 1.

x satisfying limnh(xx)1>0 is called a boundary point in the naive kernel distribution estimation and the convergence rate of F̂u(x) changes (see the proof of Theorem 1). Theorem 1 requires that μ is an integer or μ<2.

3. Fitting estimator to GPD

We employ the maximum likelihood estimation (MLE) based on the peak-over-threthold (POT) for fitting to the GPD, which was developed by Pickands (Citation1975). Let t: = tn be the threshold of the POT and N be the number of X1,X2,Xn exceeding t. Let Yj be the jth number of X1,X2,Xn exceeding t (j=1,,N). It holds that (N/N)p1, where N:=n(1F(t)). p means the probability convergence. Set γt:=(γt,ct)T and γ̂t:=argmaxγtj=1Nlnhγt(Yj), where h𝜸 is the density function of H𝜸. Then, t needs to satisfy the following assumption.

Assumption 1.

Either (i)(UnTn)0 or (ii)δ>0 s.t. both Unδ and Tnδ holds, where Tn:={(1+xt)αfortheHallclassexp(Cκtκ1x)fortheWeibullclass(1xxt)μfortheboundedclass.

PE, the fitting estimator, fundamentally depends on the approximation based on the Pickands-Balkema-De Haan theorem shown in the following Proposition 1, whose convergence is ensured by Assumption 1.

Proposition 1.

Under Assumption 1 τn:=Fu(x)Hγt(x)0.

Remark 2.

The condition (i) in Assumption 1 Un0{u=o(x)fortheHallclassu=o(x(1κ)1)orκ1fortheWeibullclass(xu)1=o(x1)fortheboundedclass restricts the threshold t to being asymptotically same as u in the following sense {t=o(x)fortheHallclasst=o(x(1κ)1)orκ1fortheWeibullclass(xt)1=o(x1)fortheboundedclass. Unδ{u=(δγ1)1x+o(1)fortheHallclassu=[(Cx)1lnδ](κ1)1+o(x(1κ)1)fortheWeibullclassu=x+(δγ1)1x+o(x)fortheboundedclass, where the Weibull class additionally needs both x = o(u) and κ < 1. (ii) being true requires t to be asymptotically same as u in a similar sense, as (i).

MLE is asymptotically efficient; however, various approaches are proposed and compared (Zhang Citation2007; Del Castillo and Serra Citation2015; Kang and Song Citation2017). Smith (Citation1987) gave the conditions that show the following scaled version of the MLE γ̂t=(1 ct1)γ̂t is asymptotically normal with a non trivial bias and N-consistent under the following Assumption 2.

Assumption 2.

λR s.t. λnλ, where λn:=n×{A1/2tα/2fortheHallclasst2exp(Ctκ)fortheWeibullclassD1/2(xt)μ/2fortheboundedclass

We have the following proposition on the accuracy of PE, which is a consequence of Smith (Citation1987).

Proposition 2.

Under Assumptions 1–2 E[(Fu(x)Hγ̂t(x))2](τn+N1/2λnηnTΣ01μ)2+N1(ηnTΣ01ηn),

where 𝛍 and the Fisher information matrix Σ0 are given in Smith (Citation1987), and ηn:=(Tn(Tnγ+1+γlnTn),1TnγγTn1+γ).

The following corollary on the convergence rate of PE immediately follows from Proposition 2.

Corollary 3.

Under the assumptions of Proposition 2, (Fu(x)Hγ̂t(x)) converges with the rate larger of τn and N1/2ηn1.

4. Comparative study

Suppose t=u and δ>0 s.t. Unδ throughout in this section, where u:={ufor the Hall class or the Weibull class(xu)1for the bounded class.

Set the threshold u=n1/8, u=n1/4, u=n1/2, or u=n3/4. Since τnTnUn+O((u)β) for the Hall class, the MSE of PE converges with the rate n1(u)α+(u)2β. The MSE for the bounded class is of order n1(u)μ+(u)2σ. The minimum of the MSE of NE is of order n1(u)α or n1(u)μ for the Hall class or the bounded class if the minimizing bandwidth converges, that is, n1(u)α+30 or n1(u)μ+30, respectively. For the Weibull class, the MSE of PE does not converge to zero when u is of the polynomial order of n, and the MSE of NE tends to infinity. If u=(lnn)1/κ, the MSE of PE converges order slower than any polynomial, the asymptotic variance nC−1 converges. The order of the MSE of NE n(4/3)(C1)(lnn)2/3 in the setting.

To sum up, when F belongs to the Hall class or the bounded class, NE converges with the same or faster rate than PE if the optimal bandwidth converges. Specifically, whether PE should be used depends on n1(u)(1/|γ|)+3 converges or not. For the Weibull class, the two estimators are not consistent if the threshold u is a polynomial order of n.

Next, the underlying distributions F were supposed to be Burr distributions defined as 1F(x)=(1+xc), where α = cℓ and β = c, Weibull distributions, and inverse Burr distributions defined as 1F(x)=(1+(x))cx<0, where μ = cℓ and σ=1/c. The parameters of the underlying distributions and the convergence rates of MSE without terms slower than any polynomial are summarized in , where the tail index γ is α−1, zero and μ−1, respectively. The hyphen means the distribution breaks the assumption of this study of the estimator. The Weibull class with γ = 0 breaks both the assumptions of PE and NE. For u=n1/8 (small relative to n) and γ far from zero, the convergence rate of MSE of NE is fast and especially close to n−1 for α or μ being close to zero while that of PE is quite slow. The relation to the convergence rate of PE is complicated, unlike that of NE. As u* gets relatively large, the convergence rate of PE becomes faster, but the requirement becomes restrictive in general. Particularly, the assumption is broken for γ being close to zero. NE loses its consistency completely if n1/2(u)1 converges to some constant, including zero.

Table 1. The polynomial convergence rates of the MSE of the estimators and the lengths.

By simulating the following, the mean integrated squared error (MISE) of PE Lu1Qu(0.1)Qu(0.9)(Hγ̂t(x)Fu(x))2dx. and that of NE F̂u, we studied the numerical accuracy in finite-sample cases. Lu:=Qu(0.9)Qu(0.1), and Qu(q) denotes the qth quantile of the ED. We suppose Qu(q)(0.1q0.9), which is intended that Un = O(1), that is, Fu(x)exp(Un)=O(1). This numerical study employs ĥ:=n1/3u1+(γ̂1/3) as the bandwidth estimator following the result in Corollary 2, where γ̂ is the MLE. The kernel functions were the Epanechnikov for the inverse Burr distributions and the Gaussian for the other distributions. We simulated the MISE values 10,000 times, where show the mean values and their standard deviation (sd), where the hyphens mean 1F(u) numerically equals zero, and so we cannot derive the MISE value. The sample sizes were (n =) 28 or 212. u* is n1∕8, n1∕4 or n1∕2.

Table 2. Scaled MISE values (×100) and sd values (×100) for the estimators.

Table 3. Scaled MISE values (×100) and sd values (×100) for the estimators.

Table 4. Scaled MISE values (×100) and sd values (×100) for the estimators.

Table 5. Scaled MISE values (×100) and sd values (×100) for the estimators.

shows the simulated results on the MISE values for the Burr cases. On the whole, NE surpasses PE for relatively small u* for example, u=n1/8 and conversely, PE is better for u=n1/2. The MISE values of NE are especially large for both α and u* being large. For u* being around n1∕4 they are comparable. For un1/2 (e.g. u=n) it is thought that PE far outperforms NE and NE is of no use.

shows the MISE values for the inverse Burr cases. NE gets inaccurate as u* becomes large relatively to n; however, the performances of PE and NE heavily depend on not only the size of u* but also the tail index γ. This numerical property is slightly different from that of the Burr cases. For γ=1/6 (i.e. c=3,=2) which is closest to zero in this study, NE is always more accurate than PE. Conversely, even though u* is small, PE outperforms NE for some cases γ being around −1 to −3.

shows the MISE values for the Weibull cases. Due to the light-tailness, 1−F(u) numerically equals zero in many cases. The remaining cases shows PE and NE are comparable, while PE is more numerically stable. In order to continue the comparative study on the Weibull cases, we chose relatively smaller (ln⁡n)1∕κ as the threshold u*. shows the simulated results on the MISE values. In this setting, 1F(u)exp(C)n1. We chose 1, 1∕2, and 1∕5 as the parameter C, where the tail gets lighter as C becomes small. For κ = 3 and C = 1, NE is quite accurate, but PE is much better than NE for κ=1/2 and C = 1 and κ = 10 and C = 1. For the other cases, they are comparable, and so we cannot conclude which one is better for the Weibull cases. For the light-tailed distribution, ED is considered to be quite sensitive to the distribution parameters. This study concludes PE and NE are comparable for the Weibull cases; however, more detailed numerical study is an important future work.

5. Real data study

This section considers a real-data study. The data is on Abisko rainfall provided by Abisko Scientific Research Station. It is available in mev package in the R software environment. The data includes the rainfall amount (in mm) and the dates from 1/1/1913 to 1/1/2015, which is given in . The time series trend was analyzed by the annual maximums and found to be not statistically significant (Rudvik Citation2012). Kiriliouk et al. (Citation2019) applied the GPD fitting with the threshold u = 12 and showed γ̂0.

Figure 1. Abisko rainfall amount (in mm) from 1/1/1913 to 1/1/2015.

Figure 1. Abisko rainfall amount (in mm) from 1/1/1913 to 1/1/2015.

Extreme rainfall causes a landslide, and so a probability estimation is required. shows the estimated ED functions of Abisko rainfall amount (in mm) by the non parametric approach (solid line) and by fitting to the GPD (dashed line). The difference between the two approaches is found to be small. is the magnified at the area [7, 14] and shows the little difference; however, the difference is less than around 4%. In this area, the non parametric approach tends to return larger values, which means a pessimistic prospect.

Figure 2. The estimated ED functions of Abisko rainfall amount (in mm) from 1/1/1913 to 1/1/2015 data by the non parametric approach (solid line) and by the fitting to the GPD (dashed line).

Figure 2. The estimated ED functions of Abisko rainfall amount (in mm) from 1/1/1913 to 1/1/2015 data by the non parametric approach (solid line) and by the fitting to the GPD (dashed line).

Figure 3. The estimated ED functions in [7, 14] of the Abisko rainfall amount (in mm) by the nonparametric approach (solid line) and by the fitting to the GPD (dashed line).

Figure 3. The estimated ED functions in [7, 14] of the Abisko rainfall amount (in mm) by the nonparametric approach (solid line) and by the fitting to the GPD (dashed line).

6. Conclusion and discussion

This study investigates the two estimators, PE and NE, of the ED above the threshold u and compares their accuracy. Asymptotic MSE of the estimators are derived and numerical study is conducted. Theoretical investigation reveals the followings. The threshold as the hyperparameter of PE denoted by t needs to be asymptotically same as u (see Assumption 1). The MSE of NE and the minimizing hyperparameter (bandwidth h) are presented. For the Weibull class, the two estimators of the ED of a polynomial order are not consistent. For the Hall class or bounded class, the accuracy of the two estimators depends on both u and the parameter γ. As u becomes larger relative to n, the two estimators tend to lose consistency. When u is small relative to n, NE is theoretically superior to PE in general. When u is large relative to n, PE excels NE. If γ > 0, the heavier the tail is, the better NE works. If γ < 0, NE outperforms PE, especially for γ being close to zero. Simulation study mostly demonstrates the asymptotic supremacy of each of the estimators. In the real data study, the difference between the two estimators are surveyed. It is found that the difference is slight, but the non parametric approach returns an estimated probability slightly larger.

The obtained result of the comparative study is different from that of distribution estimation of sample maximum. By comparing the fitting estimator to the generalized extreme value distribution and the non parametric kernel type estimator, Moriyama (Citation2021) demonstrated that the non parametric estimator is good in the case γ≒0, where the fitting estimator loses consistency. That means the performance of non parametric estimation in extreme value analysis depends on at least the target being related to the generalized Pareto distribution or the generalized extreme value distribution. This fact suggests the properties of other non parametric estimators in extreme value analysis. In order to improve the accuracy of extreme value inference, we need to continue to clarify the properties of the non parametric estimators.

Data availability

The dataset analyzed during the current study is available in the mev package in the R software environment.

Acknowledgments

The author appreciates the editor’s and referees’ valuable comments that helped us improve this manuscript.

Disclosure statement

The author declares that there are no conflicts of interest.

Additional information

Funding

This work was supported by JSPS KAKENHI Grant Number JP23K16850.

References

  • Banfi, F., G. Cazzaniga, and C. De Michele. 2022. Nonparametric extrapolation of extreme quantiles: a comparison study. Stochastic Environmental Research and Risk Assessment 36 (6):1579–96. doi:10.1007/s00477-021-02102-0.
  • Beirlant, J., Y. Goegebeur, J. Teugels, and J. Segers. 2004. Statistics of extremes: theory and applications. Chichester: John Wiley & Sons, Ltd.
  • Del Castillo, J., and I. Serra. 2015. Likelihood inference for generalized Pareto distribution. Computational Statistics & Data Analysis 83:116–28. doi:10.1016/j.csda.2014.10.014.
  • Hall, P., and I. Weissman. 1997. On the estimation of extreme tail probabilities. The Annals of Statistics25:1311–26.
  • Hall, P., and A. H. Welsh. 1984. Best attainable rates of convergence for estimates of parameters of regular variation. The Annals of Statistics 12 (3):1079–84.
  • Kang, S., and J. Song. 2017. Parameter and quantile estimation for the generalized Pareto distribution in peaks over threshold framework. Journal of the Korean Statistical Society 46 (4):487–501. doi:10.1016/j.jkss.2017.02.003.
  • Kiriliouk, A., H. Rootzén, J. Segers, and J. L. Wadsworth. 2019. peaks over thresholds modeling with multivariate generalized pareto distributions. Technometrics 61 (1):123–35. doi:10.1080/00401706.2018.1462738.
  • Moriyama, T. 2021. Parametric and nonparametric probability distribution estimators of sample maximum, arXiv preprint, arXiv:2111.03765.
  • Pickands, J. 1975. Statistical inference using extreme order statistics. The Annals of Statistics 3 (1):119–31.
  • Rudvik, A. 2012. Dependence structures in stable mixture models with an application to extreme precipitation. Licentiate thesis, Chalmers University of Technology, Gothenburg, Sweden.
  • Shimokihara, A., and Y. Maesono. 2018. Asymptotic mean squared error of kernel estimator of excess distribution function. Bulletin of Informatics and Cybernetics 50:51–64. doi:10.5109/2233859.
  • Smith, R. L. 1987. Estimating tails of probability distributions. The Annals of Statistics 15 (3):1174–1207.
  • Stupfler, G. 2016. Estimating the conditional extreme-value index under random right-censoring. Journal of Multivariate Analysis 144:1–24. doi:10.1016/j.jmva.2015.10.015.
  • Wand, M. P., and M. C. Jones. 1995. Kernel smoothing. London: Chapman & Hall.
  • Zhang, Jin. 2007. Likelihood moment estimation for the generalized pareto distribution. Australian & New Zealand Journal of Statistics 49 (1):69–77. doi:10.1111/j.1467-842X.2006.00464.x.

Appendix

Proof of Proposition 1.

For the Hall class, it follows from γ=α1 and ct = γt that Hγt(x)=1(1+xt)α=1Tn.

Since Fu(x)=1(1+xu)α+O(uβ(1+xu)α), we see τn:=Fu(x)Hγt(x)0 if either (i) (UnTn)0 or (ii) δ>0 s.t. Unδ and Tnδ holds, which means t=u=O(x). Then, τnTnUn+O(uβ).

For the Weibull class, it follows from γ = 0 and ct=κ1t1κ that Hγt(x)=1exp(Cκtκ1x)=1Tn.

Since Fu(x)=1exp(C{(x+u)κuκ})=1Un,

Un→0 when x is same as or larger order than u. We also see Un=exp(C{κuκ1x+21κ(κ1)uκ2x2+})exp(Cκuκ1x) if x = o(u). Then, considering whether uκ1x or not we see τn→0 under Assumption 1.

For the bounded class, it holds that Hγt(x)=1(1xxt)μ=1Tn.

Since Fu(x)=1(1xxu)μ+O((xu)σ(1xxu)μ), we see τnTnUn+O((xu)μσ(xx)μ) if t = u.

Combining the results, Proposition 1 has been proved. ▪

Proof of Proposition 2 for the Hall class or the bounded class.

First, we decompose the difference as follows: Fu(x)Hγ̂t(x)=[Fu(x)Hγt(x)][Hγt(x)Hγ̂t(x)]=:τn+ζn(say).

It holds that ζn=Hγt(x)Hγ̂t(x)=γHγ(x)|γ=γ~t(γ̂tγt) where Hγ(x):=1(1+(αct)1x)α and γ~t=(γ~t,c~t)T is between γ̂t and 𝜸t with probability 1.

By calculating the derivative, we have ctHγ(x)|γ=γ~t=xc~t2(1+xα~tc~t)α~t1 where α~t:=γ~t1. It follows from xα̂tĉtxαtct=xα̂tct(ctĉt1)+xαtct(αtα̂t1)p0 that ctHγ~k(x)tnp0, where tn:=ct11TnγγTn1+γ.

Similarly, it holds that αHγ(x)|γ=γ~tsnp0, where sn:=Tn(Tnγ+1+γlnTn).

Thus, we see ζn is asymptotically equivalent in distribution to N1/2ηnTN, where ηn=(sn,tn)T and tn:=cttn. Combining the results, Proposition 2 has been proved. Proposition 2 for the bounded class is proved in the same manner.

Proof of Proposition 2 for the Weibull class.

ζn:=Hγt(x)Hγ̂t(x)=γHγ(x)|γ=γ~t(γ̂tγt) holds, where γ~t is between γ̂t and 𝜸t with probability 1. We have αHγ(x)|γ=γtp0.

It holds that ctHγ(x)|γ=γt=xct2exp(xct) ctHγ(x)|γ=γ~tct1TnlnTnp0.

In the same manner as the Proof of Proposition 2, we have ζn is asymptotically equivalent in distribution to N1/2ηnTN, where ηn=(sn,tn)T, sn≡0 and tn:=cttn. Proposition 2 for the Weibull class has now been proved. ▪

Proof of Theorem 1.

𝔹 denotes the asymptotic bias of NE F̂u(x), and 𝕍 denotes the asymptotic variance later. Shimokihara and Maesono (Citation2018) proved B:=(11F(u))2(f(x+u)f(u)+Fu(x)f(u))2(z2w(z)dz)2V:=Fu(x)1F(x+u)(1F(u))2.

This is seen from F̂u(x)Fu(x) is asymptotically F̂(x+u)F̂(u)1F(u)+F(x+u)F(u)(1F(u))2{F̂(u)F(u)}Fu(x)=11F(u){F̂(x+u)F(x+u)}1F(x+u)(1F(u))2{F̂(u)F(u)}, which holds when {1F(u)}1{F̂(u)F(u)}=oP(1). It is true if either F is the Hall class or {hxκ10fortheWeibullclassh(xx)10andh(xx)10fortheboundedclass, holds. The expansion gives VFu(x)1F(x+u)(1F(u))2+2h(1F(u))2(Fu2(x)f(u)+f(x+u))z2W(z)w(z)dz.

For the Hall class, B:=α2(α+1)2u4(1(xu+1)α2+1(xu+1)α)2(z2w(z)dz)2V:=A1{1(xu+1)α}uα(xu+1)α+2hA1αuα1({1(xu+1)α}2+(xu+1)α1)z2W(z)w(z)dz.

For the Weibull class, B:=(κC)2((x+u)κ2(κC(x+u)κ+κ1)exp(C{(x+u)κuκ})uκ2(κCuκ+κ1)+[1exp(C{(x+u)κuκ})]uκ2(κCuκ+κ1))2(z2w(z)dz)2(κC)4Un2{(x+u)2κ2+u2κ2}2(z2w(z)dz)2V:=[1exp(C{(x+u)κuκ})]exp(C{2uκ(x+u)κ})+2hκCexp(Cuκ)([1exp(C{(x+u)κuκ})]2uκ1+(x+u)κ1exp(C{(x+u)κuκ}))z2W(z)w(z)dz.

For the bounded class, B:=μ2(μ+1)2(xu)4(1(1xxu)μ2+1(1xxu)μ)2(z2w(z)dz)2V:=D1{1(1xxu)μ}(xu)μ(1xxu)μ+2hD1μ(xu)μ1({1(1xxu)μ}2+(1xxu)μ1)z2W(z)w(z)dz.

By combining the results, Theorem 1 is proved. ▪

Proof of Corollary 2.

It follows from Theorem 1 that E[(Fu(x)F̂u(x))2]Un2h4ξn24(z2w(z)dz)2+1n(Un(1Un)vn2hωnzW(z)w(z)dz), where ξn:={α(α+1)u2{(xu+1)2+12(xu+1)α}κ2C2{(x+u)2κ2u2κ2}μ(μ+1)(xu)2{(1xxu)2+12(1xxu)μ}vn:={A1uαexp(Cuκ)D1(xu)μωn:={A1αuα1({1(xu+1)α}2+(xu+1)α1)κCexp(Cuκ)([1exp(C{(x+u)κuκ})]2uκ1+(x+u)κ1exp(C{(x+u)κuκ}))D1μ(xu)μ1({1(1xxu)μ}2+(1xxu)μ1).

Each of the first cases is the Hall class, the second case is the Weibull class, and the last case is the bounded class of distribution. By differencing MSE with respect to h, we see that the bandwidth minimizing the MSE is given by h=(2Un2ξn2ωnn1zW(z)w(z)dz(z2w(z)dz)2)1/3.

Suppose δ>0 s.t. Unδ. Then, minimizing bandwidth is h=(2δ2n1zW(z)w(z)dz(z2w(z)dz)2)1/3×{A1/3α1/3(α+1)2/3u(α5)/3{δ2γ+12δ}2/3{(1δ)2+δ1+γ}1/3(κC)1exp(Cuκ/3){(uκC1lnδ)2(2/κ)u2κ2}2/3(uκ1(1δ)2+(uκC1lnδ)1(1/κ)δ)1/3D1/3μ1/3(μ+1)2/3(xu)(μ5)/3{δ2γ+12δ}2/3{(1δ)2+δ1+γ}1/3.

F̂u(x) with the optimal bandwidth has the asymptotic bias ν0δ1/3n2/3×{A2/3α1/3(α+1)1/3u2α/3{δ2γ+12δ}1/3((1δ)2+δ1+γ)2/3exp(2Cuκ/3){(uκC1lnδ)2(2/κ)u2κ2}1/3(uκ1(1δ)2+(uκC1lnδ)1(1/κ)δ)2/3D2/3μ1/3(μ+1)1/3(xu)2μ/3{δ2γ+12δ}1/3((1δ)2+δ1+γ)2/3.

Corollary 2 has been proved. ▪