288
Views
1
CrossRef citations to date
0
Altmetric
Short Communications

Discussion on “on studying extreme values and systematic risks with nonlinear time series models and tail dependence measures”

&

Extreme value theory provides essential mathematical foundations for modelling tail risks and has wide applications. The emerging of big and heterogeneous data calls for the development of new extreme value theory and methods. For studying high-dimensional extremes and extreme clusters in time series, an important problem is how to measure and test for tail dependence between random variables. Section 3.1 of Dr. Zhang's paper discusses some newly proposed tail dependence measures. In the era of big data, a timely and challenging question is how to study data from heterogeneous populations, e.g. from different sources. Section 3.2 reviews some new developments of extreme value theory for maxima of maxima. The theory and methods in Sections 3.1 and 2.3 set the foundations for modelling extremes of multivariate and heterogeneous data, and we believe they have wide applicability. We will discuss two possible directions: (1) measuring and testing of partial tail dependence; (2) application of the extreme value theory for maxima of maxima in high-dimensional inference.

1. Partial tail dependence

Identifying the tail dependence between random variables can be helpful for determining appropriate multivariate extreme value distributions. Section 3.1 of Dr. Zhang's article reviews a series of work on tail dependence. Particularly, a test for tail independence based on the tail quotient correlation coefficient (TQCC) was introduced. The TQCC measurement has nice properties, including the simple interpretation and computation, and it has been successfully applied to financial risk studies, precipitation extremes, and so on. The method and theory are based on the assumption that {(Xi,Yi),i=1,,n} is a random sample of (X,Y), and the aim is to study the tail dependence between X and Y. However, in some applications, both X and Y may depend on some other covariates Z. For example, Wang et al. (Citation2012) showed that in the study of downscaling of precipitation, the coarser-resolution predictor variables generated from a global climate model can be used to predict the local extreme precipitations. For joint modelling and prediction of precipitation with other meteorological variables such as temperature, it will be important to assess the conditional tail dependence of these variables given Z, the global climate model outputs.

For any two random variables X and Y, which may not be identically distributed, the (upper) tail dependence index is defined as λ=limτ1P{Y>Qτ(Y)|X>Qτ(X)},where Qτ(Y) and Qτ(X) are the τth quantiles of Y and X, respectively. Dr. Zhang and his collaborators proposed the TQCC for measuring and testing the tail dependence. Suppose that there exist confounding variables Z which are likely to be related with X and Y, using the TQCC may lead to misleading conclusions. To account for the confounding factors, we can define the following partial tail dependence index λXYZ(Z)=limτ1P{Y>Qτ(Y|Z)|X>Qτ(X|Z)},where Qτ(Y|Z) and Qτ(X|Z) are the τth conditional quantiles of Y and X given Z, respectively. We conjecture that the index λXYZ(Z) can capture the tail dependence between X and Y after accounting for the effect of Z. It would be interesting to study the interpretation and properties of λXYZ, together with its connections and distinctions from λ.

Suppose that we are interested in testing H0:λXYZ=0 against the alternative hypothesis Ha:λXYZ>0. There are two possible ways to extend the TQCC-based method in Zhang et al. (Citation2017) to test for the partial tail independence. Such tests can also be useful for the selection of orders in autoregressive models; see the discussion of quantile partial correlation in Li et al. (Citation2015) for some related applications.

The first approach is a plug-in method. The idea is to remove the effects of Z on Y and X separately, and then assess the dependence of the estimated residuals. This approach will require specific forms for the two regression models. For instance, we may consider the following location-scale shift linear regression models, (1) X=ZTβ1+(ZTγ1)ϵ1,Y=ZTβ2+(ZTγ2)ϵ2,(1) where (ϵ1,ϵ2) are random errors, βj,γj are the unknown location and scale parameters, and ZTγj>0, j = 1, 2. Given a random sample of (Xi,Yi,Zi),i=1,,n, we can estimate the parameters (βj,γj) by (β^j,γ^j) using existing regression methods, for instance, the method in He (Citation1997). Denote ϵ^1i=(XiZiTβ^1)/ZiTγ^1 and ϵ^2i=(YiZiTβ^2)/ZiTγ^2. We can then define the partial tail quotient correlation coefficient as qn1=max1in{max(ϵ^1i,un)max(ϵ^2i,un)1}+max1in{max(ϵ^2i,un)max(ϵ^1i,un)1}max1in{max(ϵ^1i,un)max(ϵ^2i,un)}×max1in{max(ϵ^2i,un)max(ϵ^1i,un)}1,where un is varying thresholds that tend to infinity. Under the model assumptions in (Equation1), it can be shown that the partial tail dependence λXYZ is the same as the tail dependence index between ϵ1 and ϵ2. One limitation of this approach is that it relies on the location-scale shift model assumption and thus may be susceptible to model misspecifications, though we can increase the flexibility by modelling the location and scale functions nonparametrically; see for instance (He, Citation1997; Keilegom & Wang, Citation2010; Pang et al., Citation2015).

The second approach is a quantile-regression-based method, which requires modelling the tail conditional quantiles of X and Y given Z. Let Q^τ(X|Z) and Q^τ(Y|Z) be the estimated conditional τth quantile of X and Y given Z, respectively, which can be obtained from either parametric (Wang et al., Citation2012) or semiparametric quantile regression (Xu et al., Citation2020). Then we can define a second version of the partial tail quotient correlation coefficient as qn2=max1in{max(Xi,un(Zi))max(Yi,vn(Zi))1}+max1in{max(Yi,vn(Zi))max(Xi,un(Zi))1}max1in{max(Xi,un(Zi))max(Yi,vn(Zi))}×max1in{max(Yi,vn(Zi))max(Xi,un(Zi))}1,where un(Z)=Q^τ(X|Z), vn(Z)=Q^τ(Y|Z), and τ1 as n.

Similar to TQCC, the partial TQCC qn1 and qn2 can be used to test whether X and Y are tail independent after adjusting for the effect of Z. Theorem 4 in Zhang et al. (Citation2017) establishes the limiting distribution of TQCC under H0:λ=0. The measurement qn1 assumes the location-scale shift models. If the location and scale functions are estimated consistently with a certain rate, we conjecture that qn1 has the similar asymptotic properties as qun in Zhang et al. (Citation2017), so inference can be conducted by using the asymptotic χ2 distribution. The asymptotic properties of qn2 would require more careful investigation.

We conduct a small simulation study by generating data from the following model: Xi=1+Zi+ϵi1,Yi=1+2Zi+(1+γZi)ϵi2,i=1,,n,where ϵi2=σϵi1+ϵi3, ϵi1 and ϵi3 are independent standard Fréchet random variables, and ZiU(0,1). We let n = 1000 and consider two cases: the homoscedastic case with γ=0 and heteroscedastic case with γ=2. The simulation is repeated 1000 times for each scenario. For qn1, we take un as the maximum of the 95th percentiles of the estimated residuals ϵ^i1 and ϵ^i2,i=1,,n. For qn2, we let τ=0.95. Figure  plots the empirical density of Tn=2n{1exp(1/un)qn1 obtained under the null model with σ=0 and the density of χ2(4). Results show that, as in Theorem 3.8 of Dr. Zhang's paper for qun, χ2(4) provides a good approximation to the normalised qn1 under H0. Figure  plots the power curves of tests based on qn1 and qn2 using the Monte Carlo critical values against σ. The power of both tests increases gradually with σ, while qn2 exhibits higher power than qn1 for detecting the partial tail dependence. It would be interesting to further study the theoretical and empirical properties of qn2.

Figure 1. The empirical density of Tn=2n{1exp(1/un)}qn1 under H0 and the density of χ2(4).

Figure 1. The empirical density of Tn=2n{1−exp⁡(−1/un)}qn1 under H0 and the density of χ2(4).

Figure 2. Power of tests based on qn1 and qn2 for testing the partial tail independence between X and Y given Z.

Figure 2. Power of tests based on qn1 and qn2 for testing the partial tail independence between X and Y given Z.

2. Maxima of maxima for high-dimensional inference

In Section 3.2, Dr. Zhang introduces some newly developed extreme value theory for the maxima of k maxima, from either different variables or subsequences of the same variable. Denote Ml,nl=max{Yl,1,,Yl,nl}, l=,,k and Mn=max(M1,n1,,Mk,nk). Section 3.2 reviews some new results for the limiting distribution of Mn. We believe such results would be very useful for high-dimensional inference with multivariate responses.

Maximum-type statistics and extreme value theory have been used in many high-dimensional inference problems. Some examples include testing for high-dimensional mean differences (Cai et al., Citation2014; Xu et al., Citation2016), inference on high-dimensional correlation matrix (Jiang, Citation2004; Xiao & Wu, Citation2013), testing and identification of significant predictors (Tang & Pan, Citation2020), just to name a few. In these works, the test statistics are defined as the maximum of a high-dimensional independent or dependent random variables, e.g., the sample correlations (Jiang, Citation2004) or the normalised squared differences of sample covariances of p variables from two populations (Cai et al., Citation2013), the squared sample mean differences of p variables from two populations (Cai et al., Citation2014), or the squared score statistics capturing the impacts of p predictors on the response variable (Tang & Pan, Citation2020; Wu et al., Citation2019), where p is often larger than the sample size n. The hypothesis testing is then conducted by using the limiting distribution of the maximum-type statistic, that is, the Type I extreme value distribution. One typical application is in genome-wide association studies, where one main interest is in comparing the means or covariances of a large number of single nucleotide polymorphisms (SNPs) between treatment and control, or detecting possible associations between a phenotype or disease and gene pathways.

In some applications, the researcher may be interested in assessing the association between p SNPs and multiple diseases or phenotypes jointly. The new extreme value theory of maxima of maxima in Dr. Zhang's work could be helpful to develop valid testing procedures for such applications. For simplicity, we will use k = 2 to illustrate the possible application in this context. Let Sl,j,n2 denote the squared normalised score test statistic measuring the effect of the jth SNP on the lth phenotype, where j=1,,p and l = 1, 2. Under some regularity conditions, it can often be shown that Sl,j,n are asymptotically normal variables that are likely to be correlated. Suppose that we want to test the null hypothesis H0: none of the SNPs from a gene pathway is associated with the two phenotypes against the alternative HA: there exists at least one SNP that has an association with either or both phenotypes. One natural test statistic is Mn=max(M1,n,M2,n),whereMl,n=maxj=1,,pSl,j,n2,l=1,2,and we would reject H0 with large Mn. Theorem 3.11 in Dr. Zhang's work can help determine the critical value or calculate the p-value. Let F1 and F2 be the cumulative distributions of S1,j,n2 and S1,j,n2 for j=1,,p. In this context, F1 and F2 are approximately χ2(1) distribution under H0. Let m1,n and m2,n be the observed values of M1,n and M2,n. Define mn=max{m1,n,m2,n}. Then based on Theorem 3.11, we can approximate the p-value with (2) pval=P(Mn>mn|H0)=P(M1,n>mn,M2,n>mn|H0)1eτ1τ2,(2) where τ1=n{1F1(m1,n)} and τ2=n{1F2(m2,n)}.

We conduct a small simulation study to try out this idea. We generate S1,j,n and S2,j,n from the bivariate normal distribution with means b, unit variance and correlation ρ, where j=1,,p=1000. Figure  shows the power curves of this test against b for ρ=0 and 0.5. The simulation results suggest the test based on the extreme value theory for maxima of maxima performs well: the type I error is controlled around the nominal level of 0.05, and the power increases gradually with the signal b. However, as mentioned above, in the GWAS applications, Sl,j,n are often correlated across l and j=1,,p, so further research is needed to provide rigorous justification for applying the maxima of maxima theory to the multivariate high dimensional inference problems.

Figure 3. Power curve of the test based on the maxima of maxima theory for high-dimensional inference with two phenotype. The parameter ρ corresponds to the correlation used in the simulation. The x-axis represents the deviation from the null hypothesis. The horizontal line dotted corresponds to the 0.05 nominal level.

Figure 3. Power curve of the test based on the maxima of maxima theory for high-dimensional inference with two phenotype. The parameter ρ corresponds to the correlation used in the simulation. The x-axis represents the deviation from the null hypothesis. The horizontal line dotted corresponds to the 0.05 nominal level.

Acknowledgments

We congratulate Dr. Zhang for a stimulating and interesting article on the important topics of extreme values with nonlinear time series models and tail dependence measures and thank Professor Jun Shao for giving us the opportunity to discuss this work.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Wen Xu

Wen Xu is a Phd candidate of Fudan University.

Huixia Judy Wang

Huixia Judy Wang is a Professor of The George Washington University.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.