Full article: Extreme Value Statistics in Semi-Supervised Models

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We consider extreme value analysis in a semi-supervised setting, where we observe, next to the n data on the target variable, n + m data on one or more covariates. This is called the semi-supervised model with n labeled and m unlabeled data. By exploiting the tail dependence between the target variable and the covariates, we derive estimators for the extreme value index and extreme quantiles of the target variable in this setting and establish their asymptotic behavior. Our estimators substantially improve the univariate estimators, based on only the n target variable data, in terms of asymptotic variances whereas the asymptotic biases remain unchanged. A simulation study confirms the substantially improved behavior of both estimators. Finally the estimation method is applied to rainfall data in France. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

KEYWORDS:

1 Introduction

The semi-supervised model, initially introduced in machine learning, deals with unbalanced datasets, when the labeled data are harder (more expensive or more time consuming) to obtain than the unlabeled data. Consider a dataset with one variable of interest, sometimes referred to as the target variable or outcome variable, and one or more covariates. The difficulty for collecting labeled data stems from collecting the target variable, whereas unlabeled data containing only the covariates, that is with the target variable missing, can be easily collected. Semi-supervised learning focuses on uncovering the (nonlinear) relation between the target variable and the covariates. Estimations and predictions based on such relations and using the additional unlabeled data often show substantially improved performance. For example, for classification analysis see Vapnik (Citation2013) and Zhu and Goldberg (Citation2009); for regression analysis see Wasserman and Lafferty (Citation2008), Azriel et al. (Citation2022), and Chakrabortty and Cai (Citation2018).

Semi-supervised inference aims at estimating parameters or quantities regarding the target variable in the semi-supervised model. Zhang, Brown, and Cai (Citation2019) investigates the general semi-supervised framework and shows how to use the unlabeled data to improve the estimation of the mean of the target variable; for inference on heavy tailed distributions in this framework, see Ahmed and Einmahl (Citation2019).

Extreme value statistics deals with estimation of parameters or quantities related to the tail of a distribution, only making semi-parametric assumptions on this tail. Consequently, most of extreme value methods start with a relatively large number of observations n, but select only $k ≪ n$ extreme observations from the full sample for statistical inference. Two techniques are often used in selecting the extreme observations: the peaks-over-threshold (POT) approach which selects the highest k observations, and the block maxima (BM) approach which splits the full sample into k blocks and selects the maxima of each block. Since only k observations are used in estimation, typically consistent estimators have a speed of convergence of $1 / \sqrt{k}$ . In practice, to obtain accurate estimators for tail parameters/quantities, one needs a sample with a relatively large sample size n to guarantee a sufficient number of extreme observations. In contrast, the semi-supervised model is greatly suitable for statistics of extremes in case data on the target variable are hard to obtain.

The main goal of this article is 2-fold. First, we derive in this semi-supervised setting a new, improved pseudo-maximum likelihood estimator (MLE) for a general extreme value index γ and establish its asymptotic behavior. This extreme value index describes the tail heaviness of a probability distribution. If $γ > 0$ the distribution is heavy tailed and has an infinite right endpoint, if γ = 0 the distribution is light tailed and may have an infinite or finite endpoint, and if $γ < 0$ the endpoint is finite, see, for example, Beirlant et al. (Citation2004) or de Haan and Ferreira (Citation2006) for a thorough treatment of extreme value theory and the corresponding statistical inference. Second, we use the adapted estimator of γ, together with an adapted estimator of the so-called scale, to obtain an improved estimator of an extreme quantile in the semi-supervised model. We establish the asymptotic behavior of this adapted extreme quantile estimator. Simulations show that the improvement for the new extreme quantile estimator is actually larger than that for the new estimator of γ.

For ease of explanation of our novel estimator of the extreme value index, let us assume that there is only one covariate. We estimate γ for the variable of interest initially (that is ignoring the covariate) using the pseudo-MLE $\hat{γ}$ , see Smith (Citation1987) and Drees, Ferreira, and de Haan (Citation2004). Then, we choose a number g and for the covariate we transform the labeled data empirically (using all the labeled and unlabeled data) such that they obtain an artificial extreme value index g. Using the transformed covariates of the labeled data, we estimate the known g by the pseudo-MLE $\hat{g}$ , say, and use the difference $\hat{g} - g$ to adapt and substantially improve the initial estimator $\hat{γ}$ for the extreme value index γ of the variable of interest. For this adaptation the tail dependence between the target variable and the covariate is crucial.

The specific contributions of this article are as follows. First, we provide a general result for the relevant, nonstandard tail quantile process (see Lemma 6.5). Based on this tail quantile process result, one could improve other estimators based on the POT approach in the semi-supervised model. Also, we impose no assumptions on the tail of the covariates. When analyzing the tail of the target variable, it is crucial to assume regularity in its tail such as the max-domain of attraction condition in extreme value analysis. However, requiring such conditions for the covariates can be restrictive in applications. Finally, our main results are valid for a broad class of distributions for the target variable: we deal with a general extreme value index γ, which can be positive, zero, or negative. This is particularly important for applications where the sign of γ is not known beforehand. For example, when analyzing extreme weather, various studies find that the extreme value index is around zero for different meteorologic variables: for hourly surge level on the English east coast (Coles and Tawn Citation1991), for hourly maximum wind speed in Sheffield, UK (Coles and Walshaw Citation1994), for wave height and still water level on the Dutch coast (de Haan and de Ronde Citation1998) and for daily rainfall in North Holland, The Netherlands (Buishand, de Haan, and Zhou Citation2008).

This article is organized as follows. In Section 2, for clearness of the exposition, we first introduce our adapted estimator for the extreme value index in the semi-supervised model with one covariate and we establish its asymptotic normality. Subsequently we introduce the adapted estimator of an extreme quantile and analyze its asymptotic behavior. In Section 3 we consider the general multivariate semi-supervised setting and present and establish asymptotic normality of the adapted estimator of the extreme value index. Section 4 is devoted to a simulation study. The improved performance, in terms of variance, of the adapted estimators compared with the initial estimators is shown. An application to rainfall in France can be found in Section 5 and the detailed proofs are deferred to Section 6 and the supplementary material.

2 Main Results: One Covariate

2.1 Estimation of the Extreme Value Index

Let F be a bivariate distribution with marginals F₁ and F₂. We assume that F₁ is in the max-domain of attraction of an extreme value distribution $G_{γ}$ , where γ is the extreme value index, our parameter of interest. Let the pairs $(X_{1}, Y_{1}), \dots, (X_{n}, Y_{n})$ be a random sample from F, and let $(Y_{n + 1}, \dots, Y_{n + m})$ be a random sample from F₂, independent from the n pairs. This is the semi-supervised model. Assume that the tail copula R of (X₁, Y ₁) exists: (1) $\begin{matrix} R (x, y) = lim_{t ↓ 0} \frac{1}{t} P (1 - F_{1} (X_{1}) \leq tx, 1 - F_{2} (Y_{1}) \leq ty), \\ (x, y) \in {[0, \infty]}^{2} ∖ {(\infty, \infty)} . \end{matrix}$ (1)

Denote the order statistics of $X_{i}, i = 1, \dots, n,$ with $X_{1 : n} \leq \dots \leq X_{n : n},$ and similarly for the $Y_{i}, i = 1, \dots, n .$ We estimate $γ > - \frac{1}{2}$ with the often used pseudo-MLE $\hat{γ}$ based on $X_{n - k : n}, \dots, X_{n : n}$ , for $k \in {1, \dots, n - 1}$ ; see sec. 3.4 in de Haan and Ferreira (Citation2006).

Define for $i = 1, \dots, n$ , (2) ${\tilde{Y}}_{i} = {\begin{matrix} \frac{{(1 - (F_{n + m} (Y_{i}) - \frac{1}{2 (n + m)}))}^{- g} - 1}{g} & , g \neq 0, \\ - log (1 - (F_{n + m} (Y_{i}) - \frac{1}{2 (n + m)})) & , g = 0, \end{matrix}$ (2) where $F_{n + m}$ is the empirical distribution function based on $Y_{l}, l = 1, \dots, n + m$ , and $g > - \frac{1}{2}$ is a number we may choose that mimics an extreme value index. Let the order statistics of ${\tilde{Y}}_{i}, i = 1, \dots, n,$ be denoted by ${\tilde{Y}}_{1 : n} \leq \dots \leq {\tilde{Y}}_{n : n},$ and let $\hat{g}$ be the pseudo-MLE of g based on ${\tilde{Y}}_{n - k : n}, \dots, {\tilde{Y}}_{n : n}$ , using the same k as before. Of course, since we choose and hence know g, there is no direct need to estimate it. We will show below, however, that the dependence of the difference $\hat{g} - g$ and $\hat{γ}$ , helps to improve the estimator of γ in the semi-supervised setting.

For the asymptotic theory, we assume that $m = m (n)$ and (3) $k \to \infty, \frac{k}{n} \to 0, \sqrt{\frac{n}{n + m}} \to ν \in (0, 1), as n \to \infty .$ (3)

We begin with establishing the joint asymptotic normality of $\hat{γ}$ and $\hat{g}$ , a crucial result for deriving and showing asymptotic normality of our semi-supervised estimator (SSE) of γ. For that purpose we need the usual second order condition on the marginal distribution F₁. Let $U_{1} = F_{1}^{- 1} (1 - 1 / \cdot)$ be the tail quantile corresponding to F₁. We assume that there exist a positive scale function a, a positive or negative function A, with $lim_{t \to \infty} A (t) = 0$ , and $ρ \leq 0$ , such that for $x > 0,$ (4) $lim_{t \to \infty} \frac{\frac{U_{1} (tx) - U_{1} (t)}{a (t)} - \frac{x^{γ} - 1}{γ}}{A (t)} = Ψ (x), γ \in R,$ (4) where $Ψ (x) = {\begin{matrix} \frac{x^{γ + ρ} - 1}{γ + ρ}, & ρ < 0, \\ \frac{1}{γ} x^{γ} log x, & γ \neq ρ = 0, \\ \frac{1}{2} {log}^{2} x, & γ = ρ = 0, \end{matrix}$ see de Haan and Ferreira (Citation2006), p. 46.

Proposition 2.1.

Assume $γ > - \frac{1}{2}$ and choose $g > - \frac{1}{2}$ . Assume that F₂ is continuous, (1, 3), and (4) hold, and $\sqrt{k} A (\frac{n}{k}) \to λ \in R,$ as $n \to \infty$ , then with probability tending to 1, there exist unique maximizers of the likelihood functions based on ${X_{i}}_{i = 1}^{n}$ and ${{\tilde{Y}}_{i}}_{i = 1}^{n}$ , denoted as $(\hat{γ}, \hat{g})$ , such that $( \sqrt{k} ( \hat{γ} - γ ), \sqrt{k} (\hat{g} - g) ) \overset{d}{\to} N ( [ \frac{λ (1 + γ)}{(1 - ρ) (1 + γ - ρ)}, 0 ], Σ )$ where $Σ = [ \begin{matrix} {(1 + γ)}^{2} & (1 - ν^{2}) (1 + γ) (1 + g) R_{g} \\ (1 - ν^{2}) (1 + γ) (1 + g) R_{g} & (1 - ν^{2}) {(1 + g)}^{2} \end{matrix} ],$ with (5) $\begin{matrix} R_{g} = R (1, 1) + \frac{g - γ}{γ + g + 1} \\ \times ((2 γ + 1) \int_{0}^{1} \frac{R (s, 1)}{s^{1 - γ}} ds - (2 g + 1) \int_{0}^{1} \frac{R (1, t)}{t^{1 - g}} dt) . \end{matrix}$ (5)

Based on Proposition 2.1, we derive the SSE of γ. For this derivation only, take $λ = 0$ . Then the approximate bivariate normal distribution of $(\hat{γ}, \hat{g} - g),$ has mean $[γ, 0]$ and estimated covariance matrix $\frac{1}{k} \hat{Σ} = \frac{1}{k} [ \begin{matrix} {(1 + \hat{γ})}^{2} & (1 - \frac{n}{n + m}) (1 + \hat{γ}) (1 + g) {\hat{R}}_{g} \\ (1 - \frac{n}{n + m}) (1 + \hat{γ}) (1 + g) {\hat{R}}_{g} & (1 - \frac{n}{n + m}) {(1 + g)}^{2} \end{matrix} ],$ where ${\hat{R}}_{g}$ is the estimator of $R_{g},$ obtained by replacing γ with $\hat{γ}$ and the tail copula R, at 3 places, with its natural estimator (6) $\hat{R} (x, y) = \frac{1}{k} \sum_{i = 1}^{n} 1_{[X_{i} \geq X_{n - [kx] + 1 : n}, Y_{i} \geq Y_{n - [ky] + 1 : n}]}, x, y \geq 0,$ (6) see, for example, Drees and Huang (Citation1998). Maximizing the thus obtained approximate likelihood function of the single “data point” $(\hat{γ}, \hat{g} - g)$ with respect to the unknown γ we obtain as SSE for γ: (7) ${\hat{γ}}_{g} = \hat{γ} - \frac{1 + \hat{γ}}{1 + g} {\hat{R}}_{g} (\hat{g} - g) .$ (7)

Now we present the main result of this subsection, the asymptotic normality of the SSE of γ.

Theorem 2.1.

Under the conditions of Proposition 2.1, as $n \to \infty$ , (8) $\begin{matrix} \sqrt{k} ({\hat{γ}}_{g} - γ) \\ \overset{d}{\to} N (\frac{λ (1 + γ)}{(1 - ρ) (1 + γ - ρ)}, {(1 + γ)}^{2} [1 - (1 - ν^{2}) R_{g}^{2}]) . \end{matrix}$ (8)

Remark 2.1.

Note that the asymptotic bias of the SSE ${\hat{γ}}_{g}$ is the same as that of the pseudo-MLE $\hat{γ}$ (in Proposition 2.1). Therefore, when comparing both estimators we can and will focus on the (relative) reduction of the asymptotic variance which is equal to $(1 - ν^{2}) R_{g}^{2}$ . The value of the crucial $R_{g} \in [- 1, 1]$ depends on the known g and the unknown γ and R. Note that R_g can indeed be positive, zero, or negative and R_g can exceed R(1, 1) even when R is symmetric in its arguments. Nevertheless it is appealing to consider $g = γ$ , reducing R_g to R(1). Since γ is unknown, this would lead to the choice $g = \hat{γ}$ . We will not pursue this, however, since simulation results show that this random g leads to much smaller variance reductions than when taking a deterministic g not too far away from γ. Also observe that when g is close to γ and R is symmetric, then $R_{g} - R (1, 1)$ is of order ${(g - γ)}^{2}$ , that is, the variance reduction does not change much with the choice of g and is close to R(1). We will see in Section 4 through simulations that the variance reduction is substantial in “standard” semi-supervised settings.

Finally, note that in general if the function R is identical to 0 (tail independence), ${\hat{γ}}_{g}$ has the same asymptotic behavior as $\hat{γ}$ . If $R (x, y) > 0$ , for some $x, y > 0$ (tail dependence), then $R (1, 1) > 0$ . In that case for $g = γ$ we obtain $R_{g} = R (1, 1)$ : the asymptotic variance of ${\hat{γ}}_{g}$ is smaller than that of $\hat{γ}$ and it decreases with R(1). In case of tail independence, simulations indicate that in finite-samples there can be a variance inflation up to about 10%; see Section D in the supplementary material. Therefore, it is recommended to apply the method in practice only when the data clearly exhibit tail dependence.

2.2 Estimation of an Extreme Quantile

Let p be a very small positive number. For the asymptotic theory this means that $p = p (n)$ with $lim_{n \to \infty} p (n) = 0$ . Then the extreme quantile x_p is defined by $x_{p} = F_{1}^{- 1} (1 - p) = U_{1} (1 / p) .$ We first estimate γ and the scale $a (n / k)$ with the pseudo-MLEs $\hat{γ}$ (as above) and $\hat{σ}$ , based on $X_{n - k : n}, \dots, X_{n : n}$ , see, for example, sec. 3.4 in de Haan and Ferreira (Citation2006). The natural estimator of the extreme quantile x_p is then given by (see Dekkers, Einmahl, and de Haan Citation1989; de Haan and Rootzén Citation1993) (9) ${\hat{x}}_{p} = X_{n - k : n} + \hat{σ} \frac{{(\frac{k}{np})}^{\hat{γ}} - 1}{\hat{γ}} .$ (9)

Using (2), we see that g and ${(\frac{n}{k})}^{g}$ are numbers that mimic an extreme value index and a scale. They are estimated with $\hat{g}$ and ${\tilde{σ}}_{g}$ using the pseudo-MLEs based on ${\tilde{Y}}_{n - k : n}, \dots, {\tilde{Y}}_{n : n}$ , for the same k as above. Similar to the derivation of ${\hat{γ}}_{g}$ , we find an adapted estimator of σ based on the joint asymptotic normality of $\hat{σ}$ and ${\tilde{σ}}_{g}$ . For this, write for $g \neq 0$ $\begin{matrix} S_{g} = (γ + 2) (g + 2) R (1, 1) - \frac{g (γ + 1)}{γ} \int_{0}^{1} \frac{R (s, 1)}{s} ds \\ - \frac{γ (g + 1)}{g} \int_{0}^{1} \frac{R (1, t)}{t} dt + \frac{{(γ + 1)}^{2} (g + 1) (2 γ + 1)}{γ g} \\ (\frac{1}{g + 1} - \frac{1}{γ + 1} + (g + 1) \frac{g - γ}{γ + g + 1}) \int_{0}^{1} \frac{R (s, 1)}{s^{1 - γ}} ds \\ + \frac{(γ + 1) {(g + 1)}^{2} (2 g + 1)}{γ g} (\frac{1}{γ + 1} - \frac{1}{g + 1} \\ + (γ + 1) \frac{γ - g}{γ + g + 1}) \int_{0}^{1} \frac{R (1, t)}{t^{1 - g}} dt, \\ S_{0} = 2 (γ + 2) R (1, 1) + (3 γ - 1) \int_{0}^{1} \frac{R (1, t)}{t} dt \\ + γ \int_{0}^{1} \frac{R (1, t)}{t} log t dt - {(2 γ + 1)}^{2} \int_{0}^{1} \frac{R (s, 1)}{s^{1 - γ}} ds, \end{matrix}$ and estimate it with ${\hat{S}}_{g}$ obtained by replacing γ with $\hat{γ}$ and R with $\hat{R}$ . Now the SSE for the scale σ is defined by ${\hat{σ}}_{g} = \hat{σ} (1 - \frac{{\hat{S}}_{g}}{1 + {(1 + g)}^{2}} (\frac{{\tilde{σ}}_{g}}{{(\frac{n}{k})}^{g}} - 1)) .$

The adapted version of the extreme quantile estimator in (9) is now obtained by plugging in the SSEs for the extreme value index and the scale instead of the pseudo-MLEs: (10) ${\hat{x}}_{p_{g}} = X_{n - k : n} + {\hat{σ}}_{g} \frac{{(\frac{k}{np})}^{{\hat{γ}}_{g}} - 1}{{\hat{γ}}_{g}} .$ (10)

In order to establish its asymptotic normality, we first present a theorem which states the joint asymptotic normality of the three random components involved in this definition. The proofs of the results in this section are deferred to the supplementary material, Section C.

Theorem 2.2.

Assume $γ > - \frac{1}{2}$ and choose $g > - \frac{1}{2}$ . Assume that F₂ is continuous, (1, 3), and (4) hold, and $\sqrt{k} A (\frac{n}{k}) \to λ \in R,$ as $n \to \infty .$ Then as $n \to \infty,$ (11) $\begin{matrix} \sqrt{k} ({\hat{γ}}_{g} - γ, \frac{{\hat{σ}}_{g}}{a (\frac{n}{k})} - 1, \frac{X_{n - k : n} - U_{1} (\frac{n}{k})}{a (\frac{n}{k})}) \\ \overset{d}{\to} N ([\frac{λ (γ + 1)}{(1 - ρ) (1 + γ - ρ)}, \frac{- ρ λ}{(1 - ρ) (1 + γ - ρ)}, 0], K), \end{matrix}$ (11) $K = [\begin{matrix} {(1 + γ)}^{2} [1 - (1 - ν^{2}) R_{g}^{2}] & - (1 + γ) [1 + (1 - ν^{2}) Q] & (1 - ν^{2}) M_{1} \\ - (1 + γ) [1 + (1 - ν^{2}) Q] & 1 + {(1 + γ)}^{2} - (1 - ν^{2}) \frac{S_{g}^{2}}{1 + {(1 + g)}^{2}} & γ + (1 - ν^{2}) M_{2} \\ (1 - ν^{2}) M_{1} & γ + (1 - ν^{2}) M_{2} & 1 \end{matrix}],$ where $Q = \frac{R_{g} S_{g}}{1 + {(1 + g)}^{2}} + \frac{1 + γ}{1 + g} R_{g} Q_{1} + \frac{S_{g}}{1 + {(1 + g)}^{2}} Q_{2},$ R_g is as in (5), $\begin{matrix} Q_{1} : = {(1 + g)}^{2} [\frac{- γ - 2}{(g + 1) (γ + 1)} R (1, 1) + \frac{γ}{γ + 1} \int_{0}^{1} \frac{R (1, t)}{t} \frac{1 - t^{g}}{g} dt \\ + (\frac{2 (g - γ) ((γ + 1) (g + 1) + g)}{(γ + 1) (g + 1) (γ + g + 1)} + \frac{2 g + 1}{(g + 1) (γ + g + 1)}) \\ \times \int_{0}^{1} \frac{R (1, t)}{t^{1 - g}} dt + \frac{(2 γ + 1) (γ - g)}{(γ + g + 1) (g + 1)} \int_{0}^{1} \frac{R (s, 1)}{s^{1 - γ}} ds], \\ Q_{2} : = (1 + γ) (1 + g) [\frac{- g - 2}{(g + 1) (γ + 1)} R (1, 1) + \frac{g}{g + 1} \int_{0}^{1} \frac{R (s, 1)}{s} \frac{1 - s^{γ}}{γ} ds \\ + (\frac{2 (γ - g) ((γ + 1) (g + 1) + γ)}{(γ + 1) (g + 1) (γ + g + 1)} + \frac{2 γ + 1}{(γ + 1) (γ + g + 1)}) \\ \times \int_{0}^{1} \frac{R (s, 1)}{s^{1 - γ}} ds + \frac{(2 g + 1) (g - γ)}{(γ + g + 1) (γ + 1)} \int_{0}^{1} \frac{R (1, t)}{t^{1 - g}} dt], \\ M_{1} : = (1 + γ) (1 + g) R_{g} \\ \times [2 \int_{0}^{1} \frac{R (1, s)}{s^{1 - g}} ds - \int_{0}^{1} \frac{R (1, s)}{s} \frac{1 - s^{g}}{g} ds - \frac{1}{g + 1} R (1, 1)], \\ M_{2} : = \frac{- (1 + g)}{1 + {(1 + g)}^{2}} S_{g} \\ \times [(2 g + 3) \int_{0}^{1} \frac{R (1, s)}{s^{1 - g}} ds - \int_{0}^{1} \frac{R (1, s)}{s} \frac{1 - s^{g}}{g} ds - \frac{g + 2}{g + 1} R (1, 1)] . \end{matrix}$

In the expression of $Q_{1}, Q_{2}, M_{1}$ , and M₂, when considering g = 0 or γ = 0, the terms such as $\frac{1 - t^{g}}{g}, \frac{1 - s^{γ}}{γ}$ should be read as their corresponding limit.

From Theorem 2.2, we obtain the main result in this section, the asymptotic normality of the adapted extreme quantile estimator. Define $q_{γ} (t) = \int_{1}^{t} s^{γ - 1} log s ds$ , for t > 1.

Theorem 2.3.

Assume $γ > - \frac{1}{2}$ and choose $g > - \frac{1}{2}$ . Assume that F₂ is continuous, (1, 3), and (4) hold, the second-order parameter ρ is negative, or zero with γ negative, and $\sqrt{k} A (\frac{n}{k}) \to λ \in R,$ as $n \to \infty$ . Further assume $p = o (k / n)$ and $log (np) = o (\sqrt{k}),$ as $n \to \infty .$ Then as $n \to \infty,$ (12) $\sqrt{k} \frac{{\hat{x}}_{p_{g}} - x_{p}}{a (\frac{n}{k}) q_{γ} (\frac{k}{np})} \overset{d}{\to} N (\frac{λ (1 + γ)}{(1 - ρ) (γ - ρ + 1)} b, v)$ (12) where b = 1 and $v = {(1 + γ)}^{2} [1 - (1 - ν^{2}) R_{g}^{2}]$ , for $γ \geq 0 > ρ$ , and $b = ρ (1 + 2 γ) / (γ + ρ)$ and $v = {(1 + γ)}^{2} (1 + 2 γ) - (1 - ν^{2}) [{(1 + γ)}^{2} R_{g}^{2} + γ^{2} \frac{S_{g}^{2}}{1 + {(1 + g)}^{2}} - 2 γ (1 + γ) Q - 2 γ^{2} M_{1} + 2 γ^{3} M_{2}]$ , for $γ < 0$ .

Remark 2.2.

Note that the asymptotic bias is the same as that of the standard extreme quantile estimator ${\hat{x}}_{p}$ in (9), whereas the asymptotic variance is different. In case $γ \geq 0$ , the relative variance reduction when using the adapted estimator is $(1 - ν^{2}) R_{g}^{2} \geq 0$ , which is the same as the relative variance reduction of the adapted extreme value index estimator. For $γ < 0$ it is not clear if the variance reduction is nonnegative, but in the simulation study below we show substantial variance reductions, irrespective of the sign of γ. A joint variance reduction approach, instead of improving $\hat{γ}$ and $\hat{σ}$ separately, would lead to a nonnegative variance reduction for all $γ > - \frac{1}{2}$ , but this is beyond the scope of this article.

3 Main Results: Multiple Covariates

In this section we consider the more general situation with d–1 covariates where d > 2. Consider a d-variate distribution F, with marginals $F_{1}, \dots, F_{d}$ . We assume again that (only) F₁ is in the max-domain of attraction of an extreme value distribution $G_{γ}$ . Let $F_{-}$ be the distribution function of the last d–1 components of a random vector with distribution function F. Let $(X_{1}, Y_{1, 2}, \dots, Y_{1, d}), \dots, (X_{n}, Y_{n, 2}, \dots, Y_{n, d})$ be a random sample of size n from F and let $(Y_{n + 1, 2}, \dots, Y_{n + 1, d}), \dots, (Y_{n + m, 2}, \dots, Y_{n + m, d})$ be a random sample of size m from $F_{-}$ , independent of the d-variate random sample of size n. This is the multivariate semi-supervised setting.

Then, for fixed $j = 2, \dots, d$ , we use all data for the covariates ${Y_{i, j}}_{i = 1}^{n + m}$ to obtain ${{\tilde{Y}}_{i, j}}_{i = 1}^{n}$ as in (2), where we may choose a number $g > - \frac{1}{2},$ that mimics an extreme value index, as before. For $k \in {1, \dots, n - 1}$ , let, similarly as in the previous section, $\hat{γ}$ and ${\hat{g}}_{j}, j = 2, \dots, d$ , be the pseudo-MLEs of γ and (d–1 times) of g, respectively. Assume the existence of the tail copula R_ij of the ith and the jth component, (13) $R_{ij} (x, y) = lim_{t ↓ 0} \frac{1}{t} P (1 - F_{i} (Y_{1, i}) \leq tx, 1 - F_{j} (Y_{1, j}) \leq ty),$ (13)

where $(x, y) \in {[0, \infty]}^{2} ∖ {(\infty, \infty)}, 1 \leq i, j \leq d .$ Here $Y_{1, 1}$ is understood as X₁. Again, we first consider the joint asymptotic normality of $\hat{γ},$ and ${\hat{g}}_{j}, j = 2, \dots, d .$

Proposition 3.1.

Assume $γ > - \frac{1}{2}$ and choose $g > - \frac{1}{2}$ . Assume that $F_{j}, j = 2, \dots, d$ , is continuous, (3, 4), and (13) hold, and as $n \to \infty,$ $\sqrt{k} A (\frac{n}{k}) \to λ \in R,$ then with probability tending to 1, there exist unique maximizers of the likelihood functions based on ${X_{i}}_{i = 1}^{n}, {{\tilde{Y}}_{i, 2}}_{i = 1}^{n}, \dots, {{\tilde{Y}}_{i, d}}_{i = 1}^{n}$ , denoted as $(\hat{γ}, {\hat{g}}_{2}, \dots, {\hat{g}}_{d})$ , such that $\begin{matrix} (\sqrt{k} (\hat{γ} - γ), \sqrt{k} ({\hat{g}}_{2} - g), \dots, \sqrt{k} ({\hat{g}}_{d} - g)) \\ \overset{d}{\to} N ([\frac{λ (γ + 1)}{(1 - ρ) (1 + γ - ρ)}, 0, \dots, 0], Σ_{d}), \end{matrix}$ with $Σ_{d} = Γ Γ^{T} ° H$ (“ $°$ ” is the Hadamard or entrywise product), where $Γ = [\begin{matrix} 1 + γ \\ 1 + g \\ . \\ . \\ 1 + g \end{matrix}], H = [\begin{matrix} 1 & h_{12} & . & . & h_{1 d} \\ h_{12} & 1 - ν^{2} & . & . & h_{2 d} \\ . & . & . & . \\ . & . & . & . \\ h_{1 d} & h_{2 d} & . & . & 1 - ν^{2} \end{matrix}],$

$h_{1 i} = (1 - ν^{2}) [R_{1 i} (1, 1) + \frac{g - γ}{γ + g + 1} [(2 γ + 1) \int_{0}^{1} \frac{R_{1 i} (s, 1)}{s^{1 - γ}} ds$ $- (2 g + 1) \int_{0}^{1} \frac{R_{1 i} (1, t)}{t^{1 - g}} dt]],$ and $h_{ij} = (1 - ν^{2}) R_{ij} (1, 1), i = 2, \dots, d, j = i + 1, \dots, d .$

Very similar to the bivariate case, let λ = 0 and derive the SSE of γ by using the approximate multivariate normal distribution of $(\hat{γ}, {\hat{g}}_{2} - g, \dots, {\hat{g}}_{d} - g),$ with mean $[γ, 0, \dots, 0]$ , and variance $\frac{1}{k} {\hat{Σ}}_{d} = \frac{1}{k} \hat{Γ} {\hat{Γ}}^{T} ° \hat{H},$ where for the estimation of $R_{ij},$ ${\hat{R}}_{ij}$ is defined like in (6). By maximizing the approximate likelihood function of $(\hat{γ}, ({\hat{g}}_{2} - g), \dots, ({\hat{g}}_{d} - g))$ with respect to γ, we obtain the SSE in this multivariate setting: (14) ${\hat{γ}}_{g} = \hat{γ} + \frac{1 + \hat{γ}}{1 + g} \sum_{j = 2}^{d} \frac{{\hat{H}}_{1 j}^{- 1}}{{\hat{H}}_{11}^{- 1}} ({\hat{g}}_{j} - g),$ (14) where ${\hat{H}}_{ij}^{- 1}$ is the entry in the ith row and jth column of the inverse of the matrix $\hat{H}$ . The following theorem shows the asymptotic behavior of the improved estimator ${\hat{γ}}_{g}$ .

Theorem 3.1.

Assume that H is invertible. Then under the conditions of Proposition 3.1, as $n \to \infty,$ (15) $\sqrt{k} ({\hat{γ}}_{g} - γ) \overset{d}{\to} N (\frac{λ (1 + γ)}{(1 - ρ) (1 + γ - ρ)}, σ^{2}),$ (15) where $\begin{matrix} σ^{2} = {(1 + γ)}^{2} (1 + \frac{1}{{(H_{11}^{- 1})}^{2}} [2 \sum_{i = 1}^{d} \sum_{j = i + 1}^{d} H_{1 i}^{- 1} H_{1 j}^{- 1} h_{ij} \\ + (1 - ν^{2}) \sum_{j = 2}^{d} {(H_{1 j}^{- 1})}^{2}]) . \end{matrix}$

Remark 3.1.

It can be shown that increasing the number of covariates leads to a lower (or the same) asymptotic variance. More specifically, the quality of the estimation improves if we replace the “one-covariate” estimator ${\hat{γ}}_{g}$ in (7) by the present one. In particular, if we consider d = 3 and for simplicity $g = γ$ and $R_{12} (1, 1) = R_{13} (1, 1)$ , the asymptotic relative variance reduction in this case is equal to $2 (1 - ν^{2}) R_{12}^{2} (1, 1) / (1 + R_{23} (1, 1))$ , see the formula for the asymptotic relative variance reduction in Ahmed and Einmahl (Citation2019), which applies in this case. Consequently, the maximal variance reduction is achieved when $R_{23} (1, 1)$ takes its minimal attainable value. Decreasing the tail dependence of the covariates will make them, as a pair, more relevant and hence will make the variance reduction larger. See also the simulations in the trivariate case at the end of the next section.

4 Simulation Study

Here we perform mainly for the one-covariate setting a detailed simulation study, but at the end of this section we briefly consider two covariates.

First we investigate the finite sample performance of our novel SSE of γ and then we compare in detail the variances of the SSE with those of the pseudo-MLE based on $X_{n - k : n}, \dots, X_{n : n}$ only. In addition, we compare the variances of the adapted extreme quantile estimator with those of the standard estimator in (9).

We begin with simulating data from the bivariate Cauchy distribution restricted to the first quadrant. This Cauchy density is proportional to ${(1 + x S^{- 1} x^{T})}^{- 3 / 2}$ where S is a 2 × 2 scale matrix with 1 on the diagonal and s off-diagonal. For s we take alternatingly two values: 0 and 0.8. For the value 0.8 we have a rather strong tail dependence and for s = 0 the tail dependence is somewhat weaker. More precisely, the value 0 corresponds to an R(1)-value of 0.59, whereas for s = 0.8 this value is 0.76. We expect that stronger tail dependence leads to a larger variance reduction. These data are denoted by $(\overset{ˇ}{X_{i}}, Y_{i})$ . To obtain our data (X_i, Y _i), where the X_i have extreme value index γ, we transform the ${\overset{ˇ}{X}}_{i}$ , as follows (16) $X_{i} = {\begin{matrix} \frac{{(1 - F_{s} ({\overset{ˇ}{X}}_{i}))}^{- γ} - 1}{γ}, & γ \neq 0, \\ - log (1 - F_{s} ({\overset{ˇ}{X}}_{i})), & γ = 0, \end{matrix}$ (16) where F_s is the distribution function of $\overset{ˇ}{X_{i}}$ . Simulations are performed for values of γ that are negative, positive or 0.

First, we generate 500 samples of sizes $n = 500, m = 1000$ , for s = 0.8, and estimate γ using the SSE and the pseudo-MLE for $n = 500, m = 1000$ . We depict the root mean squared error (RMSE) based on these 500 samples as a function of k. We consider $γ = - 0.25, 0,$ and, $0.25,$ and take g = 0. The RMSE of the SSE (indicated by AMLE in ) is indeed substantially lower than that of the pseudo-MLE for the different values of $γ .$

Fig. 1 RMSE using the pseudo-MLE and the SSE-MLE. From left to right: $γ = - 0.25, 0, 0.25$ .

Next, we focus on the (relative) variance reduction of the SSE in comparison to the pseudo-MLE. We use the following values of n and m (and k):

$n = 1000$ , m = 500 (less unlabeled than labeled data) and k = 250,
$n = 1000$ , m = 1000 (equal number of unlabeled and labeled data) and k = 250,
$n = 500$ , m = 1000 (more unlabeled than labeled data) and k = 125.

shows the empirical percentages of variance reduction for different values of γ and g. The results are based on $10, 000$ replications. We observe that the variance reduction ranges from 10% to more than 30%, hence, indeed the SSE has a substantially smaller variance than the pseudo-MLE. By comparing the three panels, we observe that the variance reduction increases substantially with the ratio of the number of unlabeled data m and the number of labeled data n, which is in line with the asymptotic theory. Observe that the actual choice of g does not have a large influence as long as it is somewhat close to γ, a choice that is in practice often feasible.

Table 1 Variance reduction for different extreme value indices.

Display Table

Next we investigate in more detail the sensitivity of the variance reduction to the choice of g using a wider range of values of g, including cases where γ and g have opposite sign. In these simulations, we take s = 0 for the bivariate Cauchy distribution. The results, based on $10, 000$ replications, for the aforementioned values of n, m and k are presented in . Generally, there is always variance reduction, but for $| g - γ |$ relatively large the reduction is lower than when g is closer to γ. Observe that for the present range of γ (–0.3 to 0.3) the choice g = 0 yields an almost maximal variance reduction.

Fig. 2 Variance reduction for various combinations of γ and g.

Finally we study in more detail the effect of the size of m, the number of unlabeled data, on the variance reduction; again we take s = 0. We consider the case where n = 500 and let m vary; we choose g = 0. The results are based on 500 replications. shows that the variance reduction approximately doubles when m ranges from 500 to $10, 000$ .

Table 2 Variance reduction for different numbers of unlabeled data m.

Download CSV Display Table

The last part of the one-covariate simulation study is devoted to extreme quantile estimation. We take again s = 0 for the Cauchy distribution and use $10, 000$ replications. For $n = 500, m = 1000,$ and $k = 125,$ we estimate the extreme quantile x_p for $p = 1 / n = 0.002.$ shows the variance reductions when using the adapted extreme quantile estimator in (10) relative to the standard estimator in (9), for various values of γ and g. The variance reductions are substantial, also for $γ < 0$ , and range from 17% to $33 % .$

Table 3 Variance reduction for the extreme quantile estimator.

Download CSV Display Table

We also study the effect of using the SSEs for both the extreme value index and the scale leading to the extreme quantile estimator in (10), compared to using a hybrid approach that only uses the SSE ${\hat{γ}}_{g}$ for the extreme value index but keeps the classical $\hat{σ}$ when estimating the scale.

We take g = 0 and estimate the extreme quantile x_p for $p = 1 / n$ . shows the remarkable effect on the variance reduction when using the adapted estimator in (10) instead of the hybrid approach. The variance reduction is doubled in most of the cases, meaning that replacing $\hat{σ}$ by the SSE ${\hat{σ}}_{g}$ leads to a considerably improved performance.

Fig. 3 Relative variance reduction for the extreme quantile estimators.

To conclude the bivariate setting we briefly compare the here obtained sizable variance reductions with the reductions obtained from the asymptotic theory. It turns out that for the present sample sizes, when estimating γ, the asymptotic theory yields even higher reductions than the ones here obtained, partly due to the variability in the necessary estimation of the tail copulas, which is not reflected in the asymptotic variance. Based on limited comparisons it seems that for positive γ, asymptotic theory and simulations match somewhat better than in case γ is negative. Theoretically, for $γ \geq 0$ the asymptotic variance reduction of the extreme quantile estimator ${\hat{x}}_{p_{g}}$ is the same as that of ${\hat{γ}}_{g}$ . In contrast, , in conjunction with , indicates that for the practically relevant extreme quantile estimation, the variance reductions obtained in simulations can be even higher than those inferred from the asymptotic theory.

At the end of this simulation section we now briefly consider the two-covariates setting. We begin with simulating data from the trivariate Cauchy distribution restricted to the first octant. The density is proportional to ${(1 + x S^{- 1} x^{T})}^{- 2}$ where S is now a 3 × 3 scale matrix with 1 on the diagonal and s off-diagonal, but $S_{23} = S_{32} = r$ . We take respectively the values $s = r = 0, s = r = 0.8$ and $s = 0.8, r = 0.3$ . When $s = r = 0.8$ , we have rather strong tail dependence, with R(1)-values of 0.77, whereas for $s = r = 0$ the R(1)-values are 0.59. The case $s = 0.8, r = 0.3$ yields R(1)-values of 0.81, except the one between the covariates, which is 0.63. Then similarly as in (16) in the bivariate case, we transform the first coordinate to obtain our final data $(X_{i}, Y_{i, 2}, Y_{i, 3})$ ; we will use $γ = - 0.2, 0$ and 0.2. We take $n = 500, m = 1000, k = 125$ and $10, 000$ replications. shows that the variance reduction can be as high as 50%. We observe that the case $s = r = 0.8$ indeed yields larger improvements than the case $s = r = 0$ and the one-covariate setting in . Interestingly the case $s = 0.8, r = 0.3$ yields even larger variance reductions than the case $s = r = 0.8$ . The smaller tail dependence between the covariates, makes them as a pair more informative and leads to the much improved performance of ${\hat{γ}}_{g}$ .

Table 4 Variance reduction with two covariates.

Display Table

5 Application

In this section, we demonstrate an application using the SSE for analyzing forecasted precipitation data.

The national French weather service, Météo France, produces daily forecasted precipitation (in mm) at very high resolution ( ${0.1}^{°} \times {0.1}^{°}$ ) covering the mainland of France, between 2012 and 2017. Although usually an ensemble is forecasted, we use one single forecast (an ensemble member, not a forecasted mean or median) at every grid point and every day. To improve the forecasting model, meteorologists want to check if the forecasted precipitation shares the same distribution as the observed precipitation at the same location, particularly in the right tail. Consequently, the goal of this study is to estimate quantities such as the extreme value index and extreme quantiles of the forecasted precipitation distribution. We focus on forecasting at grid points that are close to an actual weather station.

Besides the forecasted precipitation, Météo France records at 123 weather stations the actual daily precipitation, between 1980 and 2017. We pair each weather station with a forecasting grid point that is closest to the station, and regard the two as the same location. When focusing on the fall seasons (91 days per year), at the 123 locations, we have 38 years actual precipitation data (3458 observations) with the last 6 years paired with forecasting data (546 observations). For the last 6 years, the paired data are dependent since the forecasting data are made to forecast the precipitation at the same location on the same day. Part of this dataset has been employed in a study comparing the spatial dependence structure of extreme forecasted precipitation and extreme observed precipitation in southern France; see Oesting and Naveau (Citation2020).Footnote¹

Since it is challenging to conduct extreme value analysis, such as extreme quantile estimation, for the forecasted precipitation based on only 546 observations, we make use of the available information in the actual precipitation to improve the estimation accuracy, exploiting the semi-supervised setting. Observe that the SSE uses the tail dependence between the two datasets, but not the (marginal) tail heaviness of the observed precipitation. Therefore, the SSE of the extreme value index of the forecasted precipitation does not incorporate properties of the distribution of the observed precipitation. This justifies comparing this SSE with the MLE of the extreme value index of the observed precipitation.

To validate that our proposed methodology can be applied to the dataset, we perform two pre-tests on the actual precipitation data (3458 observations) at each station. First, we test whether the actual precipitation at each station possesses the same distribution across time, using the test statistic T₂ proposed in Einmahl, de Haan, and Zhou (Citation2016) (with k = 200). Second, we test whether the extreme precipitation at each station can be regarded as independent over time, based on testing whether the extremal index is significantly different from 1, using the sliding block estimator proposed in Berghaus and Bücher (Citation2018) (with b = 80). To achieve the 5% significance level for the joint test, we exclude all stations for which any of the two tests rejects the null at the 2.5% significance level. Eventually, for 100 stations both null hypotheses are not rejected, and we apply our proposed method to these 100 stations.

We use the SSE ${\hat{γ}}_{g}$ with g = 0 to estimate the extreme value index, and compare it with the pseudo-MLE $\hat{γ}$ . For both estimators, we take k = 136. In particular, we estimate both the tail dependence coefficient R(1, 1) and the variance reduction factor $(1 - ν^{2}) R_{g}^{2}$ to demonstrate the presence of tail dependence and to evaluate the improvement when using the SSE. In addition, we estimate the “once per 10 year” extreme rainfall, that is the quantile at the probability level $1 - 1 / 910$ , by (9) and (10), to compare the impact of using the SSE on practically relevant quantities. Last but not least, we estimate the extreme value index and the same level high quantile based on the actual precipitation, using the pseudo-MLE. In the estimation, we use the top $k_{1} = 600$ observations, which in line with (3) corresponds to a lower fraction $k_{1} / (n + m)$ of POT compared to k/n used for the forecasted precipitation.

shows the results for three selected stations. We select the three stations from very distant areas: one from the south, one from the northwest, and one from the southwest. For the station from the south, Nîmes, the estimated extreme value index is positive indicating a heavy-tailed distribution. Taking $ν^{2} = n / (n + m)$ , the reduction in variance is estimated at 16%. The difference of the two estimates of the extreme value index leads to a substantial difference in the quantile estimates: the quantile estimated using the SSE exceeds the usual quantile estimate with roughly 50%. In contrast, for the station in the northwest, Dieppe, the estimated extreme value index is around zero. The difference between the two point estimates is small, with the SSE having 17.5% variance reduction. The two quantile estimates are about the same. Finally, for the station in the southwest, Ciboure, both estimators lead to negative estimates, although not significantly different from zero. The variance reduction is at a pronounced level: 28.4%. The quantile estimate using the SSE is somewhat lower than the usual one. Observe that the quantile estimates based on the forecasting data are overall lower than those based on the observed precipitation, also when using the SSE. This suggests that the forecasting model produces forecasts with a lighter tail than that of the actual observations. We added 90% confidence intervals for the extreme value indices and extreme quantiles. Here for the SSE approach, the “common” quantities in the asymptotic variance that have to be estimated consistently (in particular ${(1 + γ)}^{2}$ and $a (n / k) q_{γ} (k / (np))$ ) are estimated with the minimum of the estimates obtained from the MLE and SSE procedures.

Table 5 Estimation results for three stations.

Display Table

Finally we present a heatmap of the estimated variance reduction factors across all 100 locations in . The variance reduction using the SSE compared to the pseudo-MLE ranges from 9% to 34.8%, and is on average 18.5%.

Fig. 4 Heatmap of the variance reduction factor across 100 stations.

6 Proofs

We first present proofs for the one-covariate (bivariate) case and then extend the proofs to the multivariate case. The asymptotic normality for $\sqrt{k} (\hat{γ} - γ)$ , the first component of the pair in Proposition 2.1, is established in Drees, Ferreira, and de Haan (Citation2004), see also the supplementary material, Section A. However, we cannot directly use that proof, since we have to keep track of the joint behavior of $\hat{γ}$ and $\hat{g}$ . Nevertheless, we mimic that proof for both $\hat{γ}$ and $\hat{g}$ , with observing that $\hat{g}$ is based on dependent observations. In this respect the proof has to be adapted substantially. We begin with various lemmas which are needed for the main proofs.

Let C be a copula corresponding to the distribution function of $(- X_{1}, - Y_{1}) .$ Let $(V_{1, 1}, V_{1, 2}),$ $(V_{2, 1}, V_{2, 2}) \dots, (V_{n, 1}, V_{n, 2})$ be a random sample of size n from C and $V_{n + 1, 2}, \dots, V_{n + m, 2}$ be a random sample of size m from the uniform-(0, 1) distribution, independent of the random sample from C. Clearly all the $V_{i, j}, i = 1, \dots, n, j = 1, 2$ , have also a uniform-(0, 1) distribution. Write $X_{i} = F_{1}^{- 1} (1 - V_{i, 1}), i = 1, \dots, n,$ and $Y_{l} = F_{2}^{- 1} (1 - V_{l, 2}), l = 1, \dots, n + m$ . Then (X_i, Y _i), $i = 1, \dots, n$ , and $Y_{n + 1}, \dots, Y_{n + m}$ have the distributions as specified in the beginning of Section 2.

Consider the following uniform empirical distribution functions: $\begin{matrix} Γ_{n, j} (s) = \frac{1}{n} \sum_{i = 1}^{n} 1_{[0, s]} (V_{i, j}), 0 \leq s \leq 1, j = 1, 2, \\ Γ_{n + m} (t) = \frac{1}{n + m} \sum_{l = 1}^{n + m} 1_{[0, t]} (V_{l, 2}), 0 \leq t \leq 1. \end{matrix}$

The corresponding uniform tail empirical processes are $\begin{matrix} w_{n, j} (s) = \sqrt{k} [\frac{n}{k} Γ_{n, j} (\frac{k}{n} s) - s], 0 \leq s \leq 1, j = 1, 2, \\ w_{n + m} (t) = \sqrt{\frac{(n + m) k}{n}} [\frac{n}{k} Γ_{n + m} (\frac{k}{n} t) - t], 0 \leq t \leq 1. \end{matrix}$

Define the Gaussian vector of processes $(W_{1}, W_{2}, W_{3})$ , where $W_{j}, j = 1, 2, 3,$ is a standard Wiener process on $[0, T], T > 0$ , with covariances: $\begin{matrix} cov (W_{1} (s), W_{2} (t)) = R (s, t), cov (W_{1} (s), W_{3} (t)) = νR (s, t), \\ cov (W_{2} (s), W_{3} (t)) = ν (s \land t), 0 \leq s, t \leq T . \end{matrix}$

Let I denote the identity function. Then we have on ${(D [0, T])}^{3}$ , for all $0 \leq δ < \frac{1}{2}$ (17) $(\frac{w_{n, 1}}{I^{δ}}, \frac{w_{n, 2}}{I^{δ}}, \frac{w_{n + m}}{I^{δ}}) \overset{d}{\to} (\frac{W_{1}}{I^{δ}}, \frac{W_{2}}{I^{δ}}, \frac{W_{3}}{I^{δ}}), as n \to \infty .$ (17)

The proof of (17) is given in Ahmed and Einmahl (Citation2019); note that in there T = 1, but the proof for arbitrary T > 0 follows similarly. Now using a Skorohod construction we obtain from (17) that (18) $\begin{matrix} sup_{0 < s \leq T} \frac{| w_{n, j} (s) - W_{j} (s) |}{s^{δ}} \overset{a . s .}{\to} 0, j = 1, 2, and \\ sup_{0 < s \leq T} \frac{| w_{n + m} (s) - W_{3} (s) |}{s^{δ}} \overset{a . s .}{\to} 0. \end{matrix}$ (18)

The processes in (18) are different from those in (17) but we keep the same notation, since the new vector $(w_{n, 1}, w_{n, 2}, w_{n + m})$ has the same distribution as the old vector and also the new vector $(W_{1}, W_{2}, W_{3})$ has the same distribution as the old vector. In the sequel the X_i and Y_i are transformations as above of the uniform-(0,1) random variables on which the $w_{n, j}$ are based. We continue with the processes satisfying (18).

For convenience we introduce the following notation. Let f_n, h_n be positive functions on $[l_{n}, u_{n}]$ . Then we write, as $n \to \infty$ , $f_{n} \overset{P}{≍} h_{n} |_{l_{n}}^{u_{n}},$ if both $f_{n} (s) = O_{P} (h_{n} (s))$ and $h_{n} (s) = O_{P} (f_{n} (s))$ hold uniformly for $s \in [l_{n}, u_{n}]$ . This notation is useful for the following lemma, see Shorack and Wellner (Citation2009, p. 419).

Lemma 6.1.

Let $Γ_{n, j}^{- 1}, j = 1, 2,$ be the empirical quantile functions corresponding to $Γ_{n, j}, j = 1, 2$ , respectively. Then, as $n \to \infty$ , $Γ_{n, j} \overset{P}{≍} I |_{Γ_{n, j}^{- 1} (1 / (2 n))}^{1} and Γ_{n, j}^{- 1} \overset{P}{≍} I |_{1 / (2 n)}^{1}, j = 1, 2.$

The following lemma states the weighted convergence of the tail quantile processes corresponding to $Γ_{n, j}^{- 1}, j = 1, 2,$ to the processes $- W_{1}$ and $- W_{2}$ in (18).

Lemma 6.2.

Let $Γ_{n, j}^{- 1}$ , be the empirical quantile functions corresponding to $w_{n, j}$ , j = 1, 2, in (18) and let W_j be as in (18), j = 1, 2. Then for any $δ < \frac{1}{2},$ as $n \to \infty,$ (19) $sup_{\frac{1}{2 k} \leq s \leq 1} \frac{| \sqrt{k} (\frac{n}{k} Γ_{n, j}^{- 1} (\frac{k}{n} s) - s) + W_{j} (s) |}{s^{δ}} \overset{P}{\to} 0.$ (19)

Proof.

Write $v_{n, j} (s) = \sqrt{k} (\frac{n}{k} Γ_{n, j}^{- 1} (\frac{k}{n} s) - s), j = 1, 2.$ Theorem 2.3 in Einmahl (Citation1992) yields, as $n \to \infty,$ (20) $sup_{\frac{1}{2 k} \leq s \leq 1} \frac{| v_{n, j} (s) - W_{n, j} (s) |}{s^{δ}} \overset{P}{\to} 0,$ (20) where $W_{n, j}$ is an appropriate sequence of standard Wiener processes. Let W be a standard Wiener process and let $ε > 0.$ It is well-known that there exist an $η > 0$ , such that (21) $P (sup_{0 < s \leq η} \frac{| W (s) |}{s^{δ}} \geq \frac{ε}{2}) \leq \frac{ε}{2} .$ (21)

Combining (20) and (21) yields, for large n, (22) $P (sup_{\frac{1}{2 k} \leq s \leq η} \frac{| v_{n, j} (s) |}{s^{δ}} \geq ε) \leq ε .$ (22)

Combining (18, 21, 22), and Lemma 1 in Vervaat (Citation1972), yields (19). □

The next lemma is very similar to Lemma 3.1 in Drees, Ferreira, and de Haan (Citation2004), but the lemma therein cannot be used here because we need specifically the approximation with the present W₁ in order to obtain the joint behavior of $\hat{γ}$ and $\hat{g}$ .

Lemma 6.3.

Let $ε > 0$ . Assume that (3) and (4) hold and $\sqrt{k} A (\frac{n}{k}) = O (1)$ , as $n \to \infty$ . Then for suitably chosen functions a and A in (4), as $n \to \infty$ , $\begin{matrix} sup_{\frac{1}{2 k} \leq s \leq 1} s^{γ + 1 / 2 + ε} | \sqrt{k} (\frac{X_{n - [ks] : n} - U_{1} (\frac{n}{k})}{a (\frac{n}{k})} - \frac{s^{- γ} - 1}{γ}) \\ - s^{- γ - 1} W_{1} (s) - \sqrt{k} A (\frac{n}{k}) Ψ (s^{- 1}) | \overset{P}{\to} 0. \end{matrix}$

Proof.

From (4) we obtain inequality (2.3.17) in de Haan and Ferreira (Citation2006): for any $θ, δ > 0$ to be specified later, there exists $t_{0} = t_{0} (θ, δ)$ such that for all $t, tx \geq t_{0}$ , $| \frac{\frac{U_{1} (tx) - U_{1} (t)}{a (t)} - \frac{x^{γ} - 1}{γ}}{A (t)} - Ψ (x) | \leq θ x^{γ + ρ} max (x^{δ}, x^{- δ}) .$

We replace tx by $1 / Γ_{n, 1}^{- 1} (\frac{k}{n} s)$ and t by $\frac{n}{k}$ . Then we have, writing $\overset{ˇ}{s} = \frac{n}{k} Γ_{n, 1}^{- 1} (\frac{k}{n} s)$ , with probability tending to 1, as $n \to \infty$ , (23) $\begin{matrix} | \frac{X_{n - [ks] : n} - U_{1} (\frac{n}{k})}{a (\frac{n}{k})} - \frac{{\overset{ˇ}{s}}^{- γ} - 1}{γ} - A (\frac{n}{k}) Ψ ({\overset{ˇ}{s}}^{- 1}) | \\ \leq | A (\frac{n}{k}) | θ {\overset{ˇ}{s}}^{- γ - ρ} \cdot max ({\overset{ˇ}{s}}^{- δ}, {\overset{ˇ}{s}}^{δ}) . \end{matrix}$ (23)

Define $f (s) = \frac{s^{- γ} - 1}{γ}$ . Then by a Taylor expansion for some ${\overset{ˇ}{Θ}}_{n} (s)$ between $\overset{ˇ}{s}$ and s we have $f (\overset{ˇ}{s}) - f (s) = f' (s) (\overset{ˇ}{s} - s) + \frac{1}{2} f ″ ({\overset{ˇ}{Θ}}_{n} (s)) {(\overset{ˇ}{s} - s)}^{2} .$ Lemma 6.1 implies ${\overset{ˇ}{Θ}}_{n} \overset{P}{≍} I |_{\frac{1}{2 k}}^{1}$ and thus $f ″ ({\overset{ˇ}{Θ}}_{n}) \overset{P}{≍} I^{- γ - 2} |_{\frac{1}{2 k}}^{1}$ . Next, by Lemma 6.2 and the fact that for all $δ_{1} < \frac{1}{2}, sup_{0 \leq s \leq 1} | W_{1} (s) | / s^{δ_{1}} = O_{P} (1),$ we have that $sup_{\frac{1}{2 k} \leq s \leq 1} {(\overset{ˇ}{s} - s)}^{2} / s^{2 δ_{1}} = O_{P} (1 / k),$ as $n \to \infty$ . This and again Lemma 6.2 with $δ = δ_{1}$ yield, as $n \to \infty$ , uniformly for all $\frac{1}{2 k} \leq s \leq 1$ , $\begin{matrix} f (\overset{ˇ}{s}) - f (s) = - s^{- γ - 1} \frac{1}{\sqrt{k}} (- W_{1} (s) + s^{δ_{1}} o_{P} (1)) \\ + s^{- γ - 2 + 2 δ_{1}} O_{P} (\frac{1}{k}) . \end{matrix}$

Choose δ₁ such that $\frac{1 - ε}{2} < δ_{1} < \frac{1}{2}$ . Then $δ_{1} > \frac{1}{2} - ε$ and $2 δ_{1} + ε > 1$ . Hence, as $n \to \infty$ , $sup_{\frac{1}{2 k} \leq s \leq 1} s^{- \frac{3}{2} + ε + 2 δ_{1}} \leq max (1, {(2 k)}^{\frac{3}{2} - ε - 2 δ_{1}}) = o (\sqrt{k}) .$

Therefore, as $n \to \infty$ , uniformly for all $\frac{1}{2 k} \leq s \leq 1$ , (24) $\begin{matrix} f (\overset{ˇ}{s}) = f (s) + \frac{1}{\sqrt{k}} s^{- γ - 1} \\ (W_{1} (s) + s^{δ_{1}} o_{P} (1) + s^{- 1 + 2 δ_{1}} O_{P} (\frac{1}{\sqrt{k}})) \\ = f (s) + \frac{1}{\sqrt{k}} s^{- γ - 1} (W_{1} (s) + s^{1 / 2 - ε} (s^{δ_{1} - 1 / 2 + ε} o_{P} (1) \\ + s^{- 3 / 2 + ε + 2 δ_{1}} O_{P} (\frac{1}{\sqrt{k}}))) \\ = \frac{s^{- γ} - 1}{γ} + \frac{1}{\sqrt{k}} s^{- γ - 1} (W_{1} (s) + s^{1 / 2 - ε} o_{P} (1)) . \end{matrix}$ (24)

From the mean value theorem, for some $Θ_{n} (s)$ between $\overset{ˇ}{s}$ and s $Ψ ({\overset{ˇ}{s}}^{- 1}) = Ψ (s^{- 1}) - Ψ' (1 / Θ_{n} (s)) {(Θ_{n} (s))}^{- 2} (\overset{ˇ}{s} - s) .$

As above, $Θ_{n} \overset{P}{≍} I |_{\frac{1}{2 k}}^{1}$ , which implies that as $n \to \infty$ , uniformly for $\frac{1}{2 k} \leq s \leq 1$ , $| Ψ' (1 / Θ_{n} (s)) {(Θ_{n} (s))}^{- 2} | = s^{- γ - ρ - 1} (1 + | log s |) O_{P} (1) .$

Hence, using Lemma 6.2 with $δ = δ_{1}$ (as above), we have uniformly for $\frac{1}{2 k} \leq s \leq 1$ , $\begin{matrix} A (\frac{n}{k}) (Ψ ({\overset{ˇ}{s}}^{- 1}) - Ψ (s^{- 1})) \\ = \frac{1}{\sqrt{k}} A (\frac{n}{k}) s^{- γ - ρ - 1 + δ_{1}} (1 + | log s |) O_{P} (1) . \end{matrix}$

With δ₁ chosen as above, we have that as $n \to \infty$ , uniformly for $\frac{1}{2 k} \leq s \leq 1$ , (25) $A (\frac{n}{k}) Ψ ({\overset{ˇ}{s}}^{- 1}) = A (\frac{n}{k}) Ψ (s^{- 1}) + \frac{1}{\sqrt{k}} s^{- γ - ε - \frac{1}{2}} o_{P} (1) .$ (25)

Next consider the right-hand side of (23), where we take $δ < 1 / 2$ . Using Lemma 6.1, it can be bounded, uniformly for $\frac{1}{2 k} \leq s \leq 1$ , by (26) $\begin{matrix} θ | A (\frac{n}{k}) | s^{- γ - ρ - δ} O_{P} (1) = θ \sqrt{k} | A (\frac{n}{k}) | \frac{1}{\sqrt{k}} s^{- γ - δ} O_{P} (1) \\ = θ \frac{1}{\sqrt{k}} s^{- γ - ε - 1 / 2} s^{ε + 1 / 2 - δ} O_{P} (1) = θ \frac{1}{\sqrt{k}} s^{- γ - ε - 1 / 2} O_{P} (1) . \end{matrix}$ (26)

Now, plugging (24)–(26) into inequality (23) and noting that $θ > 0$ can be chosen arbitrarily small, we obtain the statement in the lemma. □

Define $Z_{n} (s) = \sqrt{k} (\frac{X_{n - [ks] : n} - X_{n - k : n}}{a (\frac{n}{k})} - \frac{s^{- γ} - 1}{γ}) .$

Then for functions a and A as in Lemma 6.3, for any $ε > 0$ , uniformly for $\frac{1}{2 k} \leq s \leq 1$ , $Z_{n} (s) = s^{- γ - 1} W_{1} (s) - W_{1} (1) + \sqrt{k} A (\frac{n}{k}) Ψ (s^{- 1})$ $+ o_{P} (1) s^{- γ - 1 / 2 - ε} .$

Hence, for $γ > - \frac{1}{2}$ , (27) $sup_{\frac{1}{2 k} \leq s \leq 1} s^{γ + 1 / 2 + ε} | Z_{n} (s) | = O_{P} (1) .$ (27)

Proposition 6.1.

Under the conditions of Lemma 6.3, for $γ > - \frac{1}{2}$ and $γ \neq 0$ , with probability tending to 1, there exists a unique maximizer of the likelihood function based on ${X_{i}}_{i = 1}^{n}$ denoted as $\hat{γ}$ , such that as $n \to \infty$ , $\sqrt{k} (\hat{γ} - γ) - \frac{{(γ + 1)}^{2}}{γ} \int_{0}^{1} (s^{γ} - (2 γ + 1) s^{2 γ}) Z_{n} (s) ds = o_{P} (1),$ and, for γ = 0, $\sqrt{k} \hat{γ} + \int_{0}^{1} (2 + log s) Z_{n} (s) ds = o_{P} (1) .$

Proof.

The existence of $\hat{γ}$ follows from Theorem 4.1 in Zhou (Citation2009). Then, using Lemma 6.3 and (27) above in conjunction with Lemma 3.2 in Drees, Ferreira, and de Haan (Citation2004), the result is obtained following the same steps as in the proof of Proposition 3.1 in Drees, Ferreira, and de Haan (Citation2004). A sketch of the main steps of the proof can be found in the supplementary material, Section A. □

To study the asymptotic behavior of $\hat{g}$ we need the following result. Define $\begin{matrix} {\tilde{w}}_{n} (s) = \frac{n}{\sqrt{k}} (Γ_{n + m} (Γ_{n, 2}^{- 1} (\frac{k}{n} s)) - \frac{k}{n} s) and \\ \tilde{W} (s) = ν W_{3} (s) - W_{2} (s) . \end{matrix}$

Lemma 6.4.

Assume that F₂ is continuous and k satisfies (3), then for any $0 \leq δ < \frac{1}{2}$ , as $n \to \infty$ , $sup_{\frac{1}{2 k} \leq s \leq 1} \frac{| {\tilde{w}}_{n} (s) - \tilde{W} (s) |}{s^{δ}} \overset{P}{\to} 0.$

Proof.

We have ${\tilde{w}}_{n} (s) = \sqrt{\frac{n}{n + m}} w_{n + m} ( \frac{n}{k} Γ_{n, 2}^{- 1} ( \frac{k}{n} s ) ) + \frac{n}{\sqrt{k}} ( Γ_{n, 2}^{- 1} ( \frac{k}{n} s ) - \frac{k}{n} s ) .$

Define $\hat{s} = \frac{n}{k} Γ_{n, 2}^{- 1} (\frac{k}{n} s)$ . From Lemma 6.2 with j = 2, (3) and (21), we see that it suffices to show that, as $n \to \infty$ , (28) $sup_{\frac{1}{2 k} \leq s \leq 1} \frac{| w_{n + m} (\hat{s}) - W_{3} (s) |}{s^{δ}} \overset{P}{\to} 0.$ (28)

Let $s_{0} \in (0, 1)$ . We first handle the region $s \geq s_{0}$ . Obviously we have $1 / s^{δ} \leq 1 / s_{0}^{δ} .$ By Lemma 6.2, as $n \to \infty$ , (29) $sup_{\frac{1}{2 k} \leq s \leq 1} | \hat{s} - s | \overset{P}{\to} 0.$ (29)

Using this, (18), and the uniform continuity of W₃ we obtain, as $n \to \infty$ , $sup_{s_{0} \leq s \leq 1} \frac{| w_{n + m} (\hat{s}) - W_{3} (s) |}{s^{δ}} \overset{P}{\to} 0.$

It remains to show that for $ε > 0$ there exists $s_{0} \in (0, 1)$ such that for large n $P (sup_{\frac{1}{2 k} \leq s \leq s_{0}} \frac{| w_{n + m} (\hat{s}) - W_{3} (s) |}{s^{δ}} \geq 3 ε) \leq 3 ε .$

Using again (21), for this it suffices to show that $P (sup_{\frac{1}{2 k} \leq s \leq s_{0}} | w_{n + m} (\hat{s}) | / s^{δ} \geq 2 ε) \leq 2 ε .$ Using Lemma 6.1, the proof is complete if we show that for all $ε > 0, κ > 0$ there exists $s_{0} \in (0, 1)$ such that for large n $P (sup_{\frac{1}{2 k} \leq s \leq s_{0}} | w_{n + m} (\hat{s}) | / {\hat{s}}^{δ} \geq 2 κ) \leq ε .$ We have $\begin{matrix} P (sup_{\frac{1}{2 k} \leq s \leq s_{0}} \frac{| w_{n + m} (\hat{s}) |}{{\hat{s}}^{δ}} \geq 2 κ) \\ \leq P (sup_{0 < t \leq 2 s_{0}} | \frac{w_{n + m} (t)}{t^{δ}} | \geq κ) + P (\hat{s} > 2 s_{0}) . \end{matrix}$

From (18) and (21), we have that for small enough $s_{0} \in (0, 1)$ the first term on the right is bounded by $ε / 2$ for large n, and using (29) we obtain that the second term on the right also does not exceed $ε / 2$ for large n. □

In the following we prove a result for the tail quantile process based on ${{\tilde{Y}}_{i}}_{i = 1}^{n}$ instead of ${X_{i}}_{i = 1}^{n}$ . The proof of the next lemma uses Lemma 6.4 and is very similar to but easier than that of Lemma 6.3, and hence will be omitted.

Lemma 6.5.

Let $ε > 0$ . Assume that F₂ is continuous and that (3) holds, then, as $n \to \infty$ , $\begin{matrix} sup_{\frac{1}{2 k} \leq s \leq 1} s^{g + 1 / 2 + ε} | \sqrt{k} (\frac{({\tilde{Y}}_{n - [ks] : n} - \frac{{(\frac{n}{k})}^{g} - 1}{g})}{{(\frac{n}{k})}^{g}} - \frac{s^{- g} - 1}{g}) \\ + t s^{- g - 1} \tilde{W} (s) | \overset{P}{\to} 0. \end{matrix}$

Define $H_{n} (s) : = \sqrt{k} (\frac{{\tilde{Y}}_{n - [ks] : n} - {\tilde{Y}}_{n - k : n}}{{(\frac{n}{k})}^{g}} - \frac{s^{- g} - 1}{g}) .$

Then for any $ε > 0$ , uniformly for $s \in [\frac{1}{2 k}, 1]$ , $H_{n} (s) = \tilde{W} (1) - s^{- g - 1} \tilde{W} (s) + o_{P} (1) s^{- g - 1 / 2 - ε} .$

Hence, for $g > - \frac{1}{2}$ , (30) $sup_{\frac{1}{2 k} \leq s \leq 1} s^{g + 1 / 2 + ε} | H_{n} (s) | = O_{P} (1) .$ (30)

Next we show a version of Lemma 3.2 in Drees, Ferreira, and de Haan (Citation2004) based on ${{\tilde{Y}}_{i}}_{i = 1}^{n}$ .

Lemma 6.6.

Assume that F₂ is continuous and k satisfies (3). Let g_n be a sequence of random variables such that $g_{n} = g + O_{P} (k^{- 1 / 2}) .$ Then, if $- 1 / 2 < g < 0$ or g > 0, as $n \to \infty,$ (31) $P ( 1 + g_{n} \frac{{\tilde{Y}}_{n - [ks] : n} - {\tilde{Y}}_{n - k : n}}{{(\frac{n}{k})}^{g}} \geq C_{n} s^{- g}, forall s \in [ \frac{1}{2k}, 1 ] ) \to 1,$ (31) for some random variables $C_{n} > 0$ such that $1 / C_{n} = O_{P} (1) .$

If g = 0, as $n \to \infty$ , (32) $P ( 1 + g_{n} ({\tilde{Y}}_{n - [ks] : n} - {\tilde{Y}}_{n - k : n} ) \geq \frac{1}{2}, forall s \in [ \frac{1}{2k}, 1 ] ) \to 1,$ (32) and (33) $sup_{s \in [0, 1]} {\tilde{Y}}_{n - [ks] : n} - {\tilde{Y}}_{n - k : n} = O_{P} (log k) .$ (33)

Proof.

Consider first $- 1 / 2 < g < 0$ or g > 0. Applying Lemma 6.1 to $Γ_{n + m}$ and $Γ_{n, 2}^{- 1}$ yields, as $n \to \infty$ , $Γ_{n + m} (Γ_{n, 2}^{- 1} (\frac{k}{n} I)) \overset{P}{≍} \frac{k}{n} I |_{\frac{1}{2 k}}^{1} .$

Define $G_{n} (s) = Γ_{n + m} (Γ_{n, 2}^{- 1} (\frac{k}{n} s)) + \frac{1}{2 (n + m)}, s \in (0, 1]$ . Hence, as $n \to \infty$ , (34) $G_{n} \overset{P}{≍} \frac{k}{n} I |_{\frac{1}{2 k}}^{1} .$ (34)

Observe that for $g \neq 0$ $\begin{matrix} s^{g} (1 + g_{n} \frac{{\tilde{Y}}_{n - [ks] : n} - {\tilde{Y}}_{n - k : n}}{{(\frac{n}{k})}^{g}}) \\ = s^{g} (1 + \frac{g_{n}}{g} \frac{[{(G_{n} (s))}^{- g} - {(G_{n} (1))}^{- g}]}{{(\frac{n}{k})}^{g}}) \\ = \frac{g_{n}}{g} {(\frac{G_{n} (s)}{(ks) / n})}^{- g} + s^{g} [(1 - {(\frac{G_{n} (1)}{k / n})}^{- g}) \\ - (\frac{g_{n}}{g} - 1) {(\frac{G_{n} (1)}{k / n})}^{- g}] = : T_{1} (s) + s^{g} [T_{2} - T_{3}] . \end{matrix}$

From (34) and $g_{n} / g \overset{P}{\to} 1$ , we have that $1 / inf_{s \in [1 / (2 k), 1]} T_{1} (s) = O_{P} (1)$ , as $n \to \infty$ . Lemma 6.4 for s = 1 yields that $T_{2} = O_{P} (1 / \sqrt{k})$ and hence, since $g > - 1 / 2, sup_{s \in [1 / (2 k), 1]} s^{g} \cdot T_{2} \overset{P}{\to} 0$ . By the assumption on g_n and again (34) we obtain similarly $sup_{s \in [1 / (2 k), 1]} s^{g} \cdot T_{3} \overset{P}{\to} 0$ . This yields (31).

In case g = 0, for $1 / (2 k) \leq s \leq 1$ , (35) ${\tilde{Y}}_{n - [ks] : n} - {\tilde{Y}}_{n - k : n} = - log G_{n} (s) + log G_{n} (1) \leq 2 log A_{n} - log s,$ (35) with $A_{n} = max (sup_{s \in [\frac{1}{2 k}, 1]} \frac{G_{n} (s)}{\frac{k}{n} s}, sup_{s \in [\frac{1}{2 k}, 1]} \frac{\frac{k}{n} s}{G_{n} (s)}) .$

If $g_{n} \geq 0$ , then $1 + g_{n} ({\tilde{Y}}_{n - [ks] : n} - {\tilde{Y}}_{n - k : n}) \geq 1.$ If $g_{n} < 0$ , then for $1 / (2 k) \leq s \leq 1$ , $1 + g_{n} ({\tilde{Y}}_{n - [ks] : n} - {\tilde{Y}}_{n - k : n}) \geq 1 + g_{n} (2 log A_{n} + log 2 + log k) .$

Since, as $n \to \infty, A_{n} = O_{P} (1)$ and $g_{n} = O_{P} (k^{- 1 / 2})$ , we obtain (32). Finally, the $sup$ in (33) is attained at $s = 1 / (2 k)$ . Hence, (35) yields (33). □

Finally, the following proposition provides the asymptotic behavior of the pseudo-MLE based on ${{\tilde{Y}}_{i}}_{i = 1}^{n}$ .

Proposition 6.2.

Assume that F₂ is continuous and k satisfies (3). For $g > - \frac{1}{2}$ and $g \neq 0$ , with probability tending to 1, there exists a unique maximizer of the likelihood function based on ${{\tilde{Y}}_{i}}_{i = 1}^{n}$ , denoted as $\hat{g}$ , such that, as $n \to \infty$ , $\sqrt{k} (\hat{g} - g) - \frac{{(g + 1)}^{2}}{g} \int_{0}^{1} (s^{g} - (2 g + 1) s^{2 g}) H_{n} (s) ds = o_{P} (1),$ and, for g = 0, $\sqrt{k} \hat{g} + \int_{0}^{1} (2 + log s) H_{n} (s) ds = o_{P} (1) .$

Proof.

The proof follows the same steps as that of Proposition 6.1; see supplementary material, Section A. The main difference is that ${{\tilde{Y}}_{i}}_{i = 1}^{n}$ are not iid observations. Nevertheless, Lemma 6.5 guarantees that statistics based on the tail quantile process of ${{\tilde{Y}}_{i}}_{i = 1}^{n}$ , for example the Hill estimator for g > 0, possess similar asymptotic behavior as in the iid case, with the only difference that the random limit is driven by a proper functional of $\tilde{W}$ instead. Such asymptotic expansions are sufficient to ensure that the proof can still be realized.

Then, following the steps explained in the proof of Proposition 6.1, by using Lemma 6.5, (30), and Lemma 6.6, we get the analogous result as in Proposition 6.1. Note that Lemma 6.6 plays an analogous role as Lemma 3.2 in Drees, Ferreira, and de Haan (Citation2004). □

Proof of Proposition 2.1.

Combining (18), Propositions 6.1 and 6.2 we obtain, as $n \to \infty,$ $(\sqrt{k} (\hat{γ} - γ), \sqrt{k} (\hat{g} - g)) \overset{d}{\to} (Ω, \tilde{Ω}),$ where $\begin{matrix} Ω = \frac{{(γ + 1)}^{2}}{γ} \int_{0}^{1} (s^{γ} - (2 γ + 1) s^{2 γ}) (s^{- γ - 1} W_{1} (s) \\ - W_{1} (1)) ds + \frac{λ (γ + 1)}{(1 - ρ) (1 + γ - ρ)} \end{matrix}$

and $\tilde{Ω} = \frac{{(g + 1)}^{2}}{g} \int_{0}^{1} (t^{g} - (2 g + 1) t^{2 g}) (\tilde{W} (1) - t^{- g - 1} \tilde{W} (t)) dt .$

Since the Wiener processes involved have mean zero, we obtain immediately the mean of the limiting pair. Also the individual variances of the limiting pair follow readily, see Drees, Ferreira, and de Haan (Citation2004). The proof is completed by calculating the covariance of the two terms, which is deferred to the supplementary material, Section B. □

Proof of Theorem 2.1.

From the uniform consistency of $\hat{R}$ on ${[0, 1]}^{2}$ , it can be shown that ${\hat{R}}_{g} \overset{P}{\to} R_{g} .$ Using the latter convergence in combination with Proposition 2.1 we obtain that, as $n \to \infty,$ (36) $\sqrt{k} ({\hat{γ}}_{g} - γ) = \sqrt{k} (\hat{γ} - γ) - \frac{1 + γ}{1 + g} R_{g} \sqrt{k} (\hat{g} - g) + o_{P} (1) .$ (36)

Now Proposition 2.1 in conjunction with the continuous mapping theorem yields (8). □

The proof of Proposition 3.1 can be given along the same lines as that of Proposition 2.1 and will be omitted. Note that the lemmas and propositions needed for the proof of Proposition 2.1 are of univariate nature and that hence immediately very similar lemmas can be stated (and proved) in the more-covariates case. Once these results are given, only a straightforward covariance calculation remains; see Ahmed and Einmahl (Citation2019) for the joint weak convergence of all the tail empirical processes involved.

Proof of Theorem 3.1.

From the uniform consistency of the tail copula estimators we obtain ${\hat{H}}_{1 j}^{- 1} \overset{P}{\to} H_{1 j}^{- 1}, j = 1, \dots, d$ , which in combination with Proposition 3.1 yields that, as $n \to \infty,$ $\sqrt{k} ({\hat{γ}}_{g} - γ) = \sqrt{k} (\hat{γ} - γ) + \frac{1 + γ}{1 + g} \sum_{j = 2}^{d} \frac{H_{1 j}^{- 1}}{H_{11}^{- 1}} \sqrt{k} ({\hat{g}}_{j} - g) + o_{P} (1) .$

Now Proposition 3.1 and the continuous mapping theorem yield (15). □

Supplementary Materials

The supplementary material provides additional proofs needed, additional simulation results, as well as all codes used for the simulations.

Supplemental material

Acknowledgments

We thank the Editor, Associate Editor, and three Referees for a careful reading of the article and various helpful comments that greatly improved the article. John Einmahl holds the Arie Kapteyn Chair 2019–2022 and gratefully acknowledges the corresponding research support.

Disclosure Statement

The authors report there are no competing interests to declare.

Notes

1 We thank Meteo France for providing the dataset. The data are available upon request, with the consent of Meteo France.

References

Ahmed, H., and Einmahl, J. H. J. (2019), “Improved Estimation of the Extreme Value Index Using Related Variables,” Extremes, 22, 553–569. DOI: 10.1007/s10687-019-00358-y.
Web of Science ®Google Scholar
Azriel, D., Brown, L. D., Sklar, M., Berk, R., Buja, A., and Zhao, L. (2022), “Semi-Supervised Linear Regression,” Journal of the American Statistical Association, 177, 2238–2251. DOI: 10.1080/01621459.2021.1915320.
Google Scholar
Beirlant, J., Goegebeur, Y., Segers, J., and Teugels, J. (2004), Statistics of Extremes: Theory and Applications, Chichster: Wiley.
Google Scholar
Berghaus, B., and Bücher, A. (2018), “Weak Convergence of a Pseudo Maximum Likelihood Estimator for the Extremal Index,” The Annals of Statistics, 46, 2307–2335. DOI: 10.1214/17-AOS1621.
Web of Science ®Google Scholar
Buishand, T., de Haan, L., and Zhou, C. (2008), “On Spatial Extremes: With Application to a Rainfall Problem,” The Annals of Applied Statistics, 2, 624–642. DOI: 10.1214/08-AOAS159.
Web of Science ®Google Scholar
Chakrabortty, A., and Cai, T. (2018), “Efficient and Adaptive Linear Regression in Semi-Supervised Settings,” The Annals of Statistics, 46, 1541–1572. DOI: 10.1214/17-AOS1594.
Web of Science ®Google Scholar
Coles, S. G., and Tawn, J. A. (1991), “Modelling Extreme Multivariate Events,” Journal of the Royal Statistical Society, Series B, 53, 377–392. DOI: 10.1111/j.2517-6161.1991.tb01830.x.
Google Scholar
Coles, S. G., and Walshaw, D. (1994), “Directional Modelling of Extreme Wind Speeds,” Journal of the Royal Statistical Society, Series C, 43, 139–157.
Web of Science ®Google Scholar
de Haan, L., and de Ronde, J. (1998), “Sea and Wind: Multivariate Extremes at Work,” Extremes, 1, 7–45. DOI: 10.1023/A:1009909800311.
Google Scholar
de Haan, L., and Ferreira, A. (2006), Extreme Value Theory: An Introduction, New York: Springer.
Google Scholar
de Haan, L., and Rootzén, H. (1993), “On the Estimation of High Quantiles,” Journal of Statistical Planning and Inference, 35, 1–13. DOI: 10.1016/0378-3758(93)90063-C.
Web of Science ®Google Scholar
Dekkers, A. L., Einmahl, J. H. J., and de Haan, L. (1989), “A Moment Estimator for the Index of an Extreme-Value Distribution,” The Annals of Statistics, 17, 1833–1855. DOI: 10.1214/aos/1176347397.
Web of Science ®Google Scholar
Drees, H., Ferreira, A., and de Haan, L. (2004), “On Maximum Likelihood Estimation of the Extreme Value Index,” The Annals of Applied Probability, 14, 1179–1201. DOI: 10.1214/105051604000000279.
Web of Science ®Google Scholar
Drees, H., and Huang, X. (1998), “Best Attainable Rates of Convergence for Estimators of the Stable Tail Dependence Function,” Journal of Multivariate Analysis, 64, 25–46. DOI: 10.1006/jmva.1997.1708.
Web of Science ®Google Scholar
Einmahl, J. H. J. (1992), “Limit Theorems for Tail Processes with Application to Intermediate Quantile Estimation,” Journal of Statistical Planning and Inference, 32, 137–145. DOI: 10.1016/0378-3758(92)90156-M.
Web of Science ®Google Scholar
Einmahl, J. H. J., de Haan, L., and Zhou, C. (2016), “Statistics of Heteroscedastic Extremes,” Journal of the Royal Statistical Society, Series B, 78, 31–51. DOI: 10.1111/rssb.12099.
Google Scholar
Oesting, M., and Naveau, P. (2020), “Spatial Modeling of Heavy Precipitation by Coupling Weather Station Recordings and Ensemble Forecasts with Max-Stable Processes,” arXiv preprint arXiv:2003.05854.
Google Scholar
Shorack, G. R., and Wellner, J. A. (2009), Empirical Processes with Applications to Statistics, Philadelphia: SIAM.
Google Scholar
Smith, R. L. (1987), “Estimating Tails of Probability Distributions,” The Annals of Statistics, 15, 1174–1207. DOI: 10.1214/aos/1176350499.
Web of Science ®Google Scholar
Vapnik, V. (2013), The Nature of Statistical Learning Theory, New York: Springer.
Google Scholar
Vervaat, W. (1972), “Functional Central Limit Theorems for Processes with Positive Drift and their Inverses,” Probability Theory and Related Fields, 23, 245–253.
Web of Science ®Google Scholar
Wasserman, L., and Lafferty, J. D. (2008), “Statistical Analysis of Semi-Supervised Regression,” in Advances in Neural Information Processing Systems, pp. 801–808.
Google Scholar
Zhang, A., Brown, L. D., and Cai, T. T. (2019), “Semi-Supervised Inference: General Theory and Estimation of Means,” The Annals of Statistics, 47, 2538–2566. DOI: 10.1214/18-AOS1756.
Web of Science ®Google Scholar
Zhou, C. (2009), “Existence and Consistency of the Maximum Likelihood Estimator for the Extreme Value Index,” Journal of Multivariate Analysis, 100, 794–815. DOI: 10.1016/j.jmva.2008.08.009.
Web of Science ®Google Scholar
Zhu, X., and Goldberg, A. B. (2009), “Introduction to Semi-Supervised Learning,” Synthesis Lectures on Artificial Intelligence and Machine Learning, 3, 1–130. DOI: 10.2200/S00196ED1V01Y200906AIM006.
Google Scholar

Extreme Value Statistics in Semi-Supervised Models

Abstract

1 Introduction

2 Main Results: One Covariate

2.1 Estimation of the Extreme Value Index

2.2 Estimation of an Extreme Quantile

3 Main Results: Multiple Covariates

4 Simulation Study

Table 1 Variance reduction for different extreme value indices.

Table 2 Variance reduction for different numbers of unlabeled data m.

Table 3 Variance reduction for the extreme quantile estimator.

Table 4 Variance reduction with two covariates.

5 Application

Table 5 Estimation results for three stations.

6 Proofs

Supplementary Materials

UASA_A_2333582_code__2_.zip

semisupervise_supplementary.pdf

acc-form-AhmedEinmahlZhou.pdf

Acknowledgments

Disclosure Statement

References

Information for

Open access

Opportunities

Help and information

Extreme Value Statistics in Semi-Supervised Models

Abstract

1 Introduction

2 Main Results: One Covariate

2.1 Estimation of the Extreme Value Index

2.2 Estimation of an Extreme Quantile

3 Main Results: Multiple Covariates

4 Simulation Study

Table 1 Variance reduction for different extreme value indices.

Table 2 Variance reduction for different numbers of unlabeled data m.

Table 3 Variance reduction for the extreme quantile estimator.

Table 4 Variance reduction with two covariates.

5 Application

Table 5 Estimation results for three stations.

6 Proofs

Supplementary Materials

UASA_A_2333582_code__2_.zip

semisupervise_supplementary.pdf

acc-form-AhmedEinmahlZhou.pdf

Acknowledgments

Disclosure Statement

Notes

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature