Search in:

Statistical Theory and Related Fields Volume 5, 2021 - Issue 1

Submit an article Journal homepage

Free access

288

Views

CrossRef citations to date

Altmetric

Listen

Short Communications

Discussion on “on studying extreme values and systematic risks with nonlinear time series models and tail dependence measures”

Wen Xua Department of Statistics, Fudan University, Shanghai, People's Republic of ChinaCorrespondence[email protected]
View further author information

Huixia Judy Wangb Department of Statistics, The George Washington University, Washington, DC, USAView further author information

Pages 26-30 | Published online: 04 Mar 2021

Cite this article
https://doi.org/10.1080/24754269.2021.1895528
CrossMark

In this article

1. Partial tail dependence
2. Maxima of maxima for high-dimensional inference
Acknowledgements
Disclosure statement
Additional information
References

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Extreme value theory provides essential mathematical foundations for modelling tail risks and has wide applications. The emerging of big and heterogeneous data calls for the development of new extreme value theory and methods. For studying high-dimensional extremes and extreme clusters in time series, an important problem is how to measure and test for tail dependence between random variables. Section 3.1 of Dr. Zhang's paper discusses some newly proposed tail dependence measures. In the era of big data, a timely and challenging question is how to study data from heterogeneous populations, e.g. from different sources. Section 3.2 reviews some new developments of extreme value theory for maxima of maxima. The theory and methods in Sections 3.1 and 2.3 set the foundations for modelling extremes of multivariate and heterogeneous data, and we believe they have wide applicability. We will discuss two possible directions: (1) measuring and testing of partial tail dependence; (2) application of the extreme value theory for maxima of maxima in high-dimensional inference.

1. Partial tail dependence

Identifying the tail dependence between random variables can be helpful for determining appropriate multivariate extreme value distributions. Section 3.1 of Dr. Zhang's article reviews a series of work on tail dependence. Particularly, a test for tail independence based on the tail quotient correlation coefficient (TQCC) was introduced. The TQCC measurement has nice properties, including the simple interpretation and computation, and it has been successfully applied to financial risk studies, precipitation extremes, and so on. The method and theory are based on the assumption that ${(X_{i}, Y_{i}), i = 1, \dots, n}$ is a random sample of $(X, Y)$ , and the aim is to study the tail dependence between X and Y. However, in some applications, both X and Y may depend on some other covariates Z. For example, Wang et al. (Citation2012) showed that in the study of downscaling of precipitation, the coarser-resolution predictor variables generated from a global climate model can be used to predict the local extreme precipitations. For joint modelling and prediction of precipitation with other meteorological variables such as temperature, it will be important to assess the conditional tail dependence of these variables given Z, the global climate model outputs.

For any two random variables X and Y, which may not be identically distributed, the (upper) tail dependence index is defined as $λ = lim_{τ \to 1} P {Y > Q_{τ} (Y) | X > Q_{τ} (X)},$ where $Q_{τ} (Y)$ and $Q_{τ} (X)$ are the τth quantiles of Y and X, respectively. Dr. Zhang and his collaborators proposed the TQCC for measuring and testing the tail dependence. Suppose that there exist confounding variables Z which are likely to be related with X and Y, using the TQCC may lead to misleading conclusions. To account for the confounding factors, we can define the following partial tail dependence index $λ_{X Y \cdot Z} (Z) = lim_{τ \to 1} P {Y > Q_{τ} (Y | Z) | X > Q_{τ} (X | Z)},$ where $Q_{τ} (Y | Z)$ and $Q_{τ} (X | Z)$ are the τth conditional quantiles of Y and X given Z, respectively. We conjecture that the index $λ_{X Y \cdot Z} (Z)$ can capture the tail dependence between X and Y after accounting for the effect of Z. It would be interesting to study the interpretation and properties of $λ_{X Y \cdot Z}$ , together with its connections and distinctions from λ.

Suppose that we are interested in testing $H_{0} : λ_{X Y \cdot Z} = 0$ against the alternative hypothesis $H_{a} : λ_{X Y \cdot Z} > 0$ . There are two possible ways to extend the TQCC-based method in Zhang et al. (Citation2017) to test for the partial tail independence. Such tests can also be useful for the selection of orders in autoregressive models; see the discussion of quantile partial correlation in Li et al. (Citation2015) for some related applications.

The first approach is a plug-in method. The idea is to remove the effects of Z on Y and X separately, and then assess the dependence of the estimated residuals. This approach will require specific forms for the two regression models. For instance, we may consider the following location-scale shift linear regression models, (1) $\begin{aligned} X & = Z^{T} β_{1} + (Z^{T} γ_{1}) ϵ_{1}, \\ Y & = Z^{T} β_{2} + (Z^{T} γ_{2}) ϵ_{2}, \end{aligned}$ (1) where $(ϵ_{1}, ϵ_{2})$ are random errors, $β_{j}, γ_{j}$ are the unknown location and scale parameters, and $Z^{T} γ_{j} > 0$ , j = 1, 2. Given a random sample of $(X_{i}, Y_{i}, Z_{i}), i = 1, \dots, n$ , we can estimate the parameters $(β_{j}, γ_{j})$ by $({\hat{β}}_{j}, {\hat{γ}}_{j})$ using existing regression methods, for instance, the method in He (Citation1997). Denote ${\hat{ϵ}}_{1 i} = (X_{i} - Z_{i}^{T} {\hat{β}}_{1}) / Z_{i}^{T} {\hat{γ}}_{1}$ and ${\hat{ϵ}}_{2 i} = (Y_{i} - Z_{i}^{T} {\hat{β}}_{2}) / Z_{i}^{T} {\hat{γ}}_{2}$ . We can then define the partial tail quotient correlation coefficient as $\begin{aligned} q_{n 1} & = \frac{\begin{matrix} max_{1 \leq i \leq n} {\frac{max ({\hat{ϵ}}_{1 i}, u_{n})}{max ({\hat{ϵ}}_{2 i}, u_{n})} - 1} \\ + max_{1 \leq i \leq n} {\frac{max ({\hat{ϵ}}_{2 i}, u_{n})}{max ({\hat{ϵ}}_{1 i}, u_{n})} - 1} \end{matrix}}{\begin{matrix} max_{1 \leq i \leq n} {\frac{max ({\hat{ϵ}}_{1 i}, u_{n})}{max ({\hat{ϵ}}_{2 i}, u_{n})}} \\ \times max_{1 \leq i \leq n} {\frac{max ({\hat{ϵ}}_{2 i}, u_{n})}{max ({\hat{ϵ}}_{1 i}, u_{n})}} - 1 \end{matrix}}, \end{aligned}$ where $u_{n}$ is varying thresholds that tend to infinity. Under the model assumptions in (Equation1(1) $\begin{aligned} X & = Z^{T} β_{1} + (Z^{T} γ_{1}) ϵ_{1}, \\ Y & = Z^{T} β_{2} + (Z^{T} γ_{2}) ϵ_{2}, \end{aligned}$ (1) ), it can be shown that the partial tail dependence $λ_{X Y \cdot Z}$ is the same as the tail dependence index between $ϵ_{1}$ and $ϵ_{2}$ . One limitation of this approach is that it relies on the location-scale shift model assumption and thus may be susceptible to model misspecifications, though we can increase the flexibility by modelling the location and scale functions nonparametrically; see for instance (He, Citation1997; Keilegom & Wang, Citation2010; Pang et al., Citation2015).

The second approach is a quantile-regression-based method, which requires modelling the tail conditional quantiles of X and Y given Z. Let ${\hat{Q}}_{τ} (X | Z)$ and ${\hat{Q}}_{τ} (Y | Z)$ be the estimated conditional τth quantile of X and Y given Z, respectively, which can be obtained from either parametric (Wang et al., Citation2012) or semiparametric quantile regression (Xu et al., Citation2020). Then we can define a second version of the partial tail quotient correlation coefficient as $\begin{aligned} q_{n 2} = \frac{\begin{matrix} max_{1 \leq i \leq n} {\frac{max (X_{i}, u_{n} (Z_{i}))}{max (Y_{i}, v_{n} (Z_{i}))} - 1} \\ + max_{1 \leq i \leq n} {\frac{max (Y_{i}, v_{n} (Z_{i}))}{max (X_{i}, u_{n} (Z_{i}))} - 1} \end{matrix}}{\begin{matrix} max_{1 \leq i \leq n} {\frac{max (X_{i}, u_{n} (Z_{i}))}{max (Y_{i}, v_{n} (Z_{i}))}} \\ \times max_{1 \leq i \leq n} {\frac{max (Y_{i}, v_{n} (Z_{i}))}{max (X_{i}, u_{n} (Z_{i}))}} - 1 \end{matrix}}, \end{aligned}$ where $u_{n} (Z) = {\hat{Q}}_{τ} (X | Z)$ , $v_{n} (Z) = {\hat{Q}}_{τ} (Y | Z)$ , and $τ \to 1$ as $n \to \infty .$

Similar to TQCC, the partial TQCC $q_{n 1}$ and $q_{n 2}$ can be used to test whether X and Y are tail independent after adjusting for the effect of Z. Theorem 4 in Zhang et al. (Citation2017) establishes the limiting distribution of TQCC under $H_{0} : λ = 0$ . The measurement $q_{n 1}$ assumes the location-scale shift models. If the location and scale functions are estimated consistently with a certain rate, we conjecture that $q_{n 1}$ has the similar asymptotic properties as $q_{u_{n}}$ in Zhang et al. (Citation2017), so inference can be conducted by using the asymptotic $χ^{2}$ distribution. The asymptotic properties of $q_{n 2}$ would require more careful investigation.

We conduct a small simulation study by generating data from the following model: $\begin{aligned} X_{i} & = 1 + Z_{i} + ϵ_{i 1}, \\ Y_{i} & = 1 + 2 Z_{i} + (1 + γ Z_{i}) ϵ_{i 2}, i = 1, \dots, n, \end{aligned}$ where $ϵ_{i 2} = σ ϵ_{i 1} + ϵ_{i 3}$ , $ϵ_{i 1}$ and $ϵ_{i 3}$ are independent standard Fréchet random variables, and $Z_{i} \sim U (0, 1)$ . We let n = 1000 and consider two cases: the homoscedastic case with $γ = 0$ and heteroscedastic case with $γ = 2$ . The simulation is repeated 1000 times for each scenario. For $q_{n 1}$ , we take $u_{n}$ as the maximum of the 95th percentiles of the estimated residuals ${\hat{ϵ}}_{i 1}$ and ${\hat{ϵ}}_{i 2}, i = 1, \dots, n$ . For $q_{n 2}$ , we let $τ = 0.95$ . Figure plots the empirical density of $T_{n} = 2 n {1 - \exp (- 1 / u_{n}) q_{n 1}$ obtained under the null model with $σ = 0$ and the density of $χ^{2} (4)$ . Results show that, as in Theorem 3.8 of Dr. Zhang's paper for $q_{u_{n}}$ , $χ^{2} (4)$ provides a good approximation to the normalised $q_{n 1}$ under $H_{0}$ . Figure plots the power curves of tests based on $q_{n 1}$ and $q_{n 2}$ using the Monte Carlo critical values against σ. The power of both tests increases gradually with σ, while $q_{n 2}$ exhibits higher power than $q_{n 1}$ for detecting the partial tail dependence. It would be interesting to further study the theoretical and empirical properties of $q_{n 2}$ .

Figure 1. The empirical density of $T_{n} = 2 n {1 - \exp (- 1 / u_{n})} q_{n 1}$ under $H_{0}$ and the density of $χ^{2} (4)$ .

Figure 2. Power of tests based on $q_{n 1}$ and $q_{n 2}$ for testing the partial tail independence between X and Y given Z.

2. Maxima of maxima for high-dimensional inference

In Section 3.2, Dr. Zhang introduces some newly developed extreme value theory for the maxima of k maxima, from either different variables or subsequences of the same variable. Denote $M_{l, n_{l}} = max {Y_{l, 1}, \dots, Y_{l, n_{l}}}$ , $l =, \dots, k$ and $M_{n} = max (M_{1, n_{1}}, \dots, M_{k, n_{k}})$ . Section 3.2 reviews some new results for the limiting distribution of $M_{n}$ . We believe such results would be very useful for high-dimensional inference with multivariate responses.

Maximum-type statistics and extreme value theory have been used in many high-dimensional inference problems. Some examples include testing for high-dimensional mean differences (Cai et al., Citation2014; Xu et al., Citation2016), inference on high-dimensional correlation matrix (Jiang, Citation2004; Xiao & Wu, Citation2013), testing and identification of significant predictors (Tang & Pan, Citation2020), just to name a few. In these works, the test statistics are defined as the maximum of a high-dimensional independent or dependent random variables, e.g., the sample correlations (Jiang, Citation2004) or the normalised squared differences of sample covariances of p variables from two populations (Cai et al., Citation2013), the squared sample mean differences of p variables from two populations (Cai et al., Citation2014), or the squared score statistics capturing the impacts of p predictors on the response variable (Tang & Pan, Citation2020; Wu et al., Citation2019), where p is often larger than the sample size n. The hypothesis testing is then conducted by using the limiting distribution of the maximum-type statistic, that is, the Type I extreme value distribution. One typical application is in genome-wide association studies, where one main interest is in comparing the means or covariances of a large number of single nucleotide polymorphisms (SNPs) between treatment and control, or detecting possible associations between a phenotype or disease and gene pathways.

In some applications, the researcher may be interested in assessing the association between p SNPs and multiple diseases or phenotypes jointly. The new extreme value theory of maxima of maxima in Dr. Zhang's work could be helpful to develop valid testing procedures for such applications. For simplicity, we will use k = 2 to illustrate the possible application in this context. Let $S_{l, j, n}^{2}$ denote the squared normalised score test statistic measuring the effect of the jth SNP on the lth phenotype, where $j = 1, \dots, p$ and l = 1, 2. Under some regularity conditions, it can often be shown that $S_{l, j, n}$ are asymptotically normal variables that are likely to be correlated. Suppose that we want to test the null hypothesis $H_{0}$ : none of the SNPs from a gene pathway is associated with the two phenotypes against the alternative $H_{A}$ : there exists at least one SNP that has an association with either or both phenotypes. One natural test statistic is $\begin{aligned} M_{n} & = max (M_{1, n}, M_{2, n}), where \\ M_{l, n} & = max_{j = 1, \dots, p} S_{l, j, n}^{2}, l = 1, 2, \end{aligned}$ and we would reject $H_{0}$ with large $M_{n}$ . Theorem 3.11 in Dr. Zhang's work can help determine the critical value or calculate the p-value. Let $F_{1}$ and $F_{2}$ be the cumulative distributions of $S_{1, j, n}^{2}$ and $S_{1, j, n}^{2}$ for $j = 1, \dots, p$ . In this context, $F_{1}$ and $F_{2}$ are approximately $χ^{2} (1)$ distribution under $H_{0}$ . Let $m_{1, n}$ and $m_{2, n}$ be the observed values of $M_{1, n}$ and $M_{2, n}$ . Define $m_{n} = max {m_{1, n}, m_{2, n}}$ . Then based on Theorem 3.11, we can approximate the p-value with (2) $\begin{aligned} p - val & = P (M_{n} > m_{n} | H_{0}) \\ = P (M_{1, n} > m_{n}, M_{2, n} > m_{n} | H_{0}) \\ \approx 1 - e^{- τ_{1} - τ_{2}}, \end{aligned}$ (2) where $τ_{1} = n {1 - F_{1} (m_{1, n})}$ and $τ_{2} = n {1 - F_{2} (m_{2, n})}$ .

We conduct a small simulation study to try out this idea. We generate $S_{1, j, n}$ and $S_{2, j, n}$ from the bivariate normal distribution with means b, unit variance and correlation ρ, where $j = 1, \dots, p = 1000.$ Figure shows the power curves of this test against b for $ρ = 0$ and 0.5. The simulation results suggest the test based on the extreme value theory for maxima of maxima performs well: the type I error is controlled around the nominal level of 0.05, and the power increases gradually with the signal b. However, as mentioned above, in the GWAS applications, $S_{l, j, n}$ are often correlated across l and $j = 1, \dots, p$ , so further research is needed to provide rigorous justification for applying the maxima of maxima theory to the multivariate high dimensional inference problems.

Figure 3. Power curve of the test based on the maxima of maxima theory for high-dimensional inference with two phenotype. The parameter ρ corresponds to the correlation used in the simulation. The x-axis represents the deviation from the null hypothesis. The horizontal line dotted corresponds to the 0.05 nominal level.

Acknowledgments

We congratulate Dr. Zhang for a stimulating and interesting article on the important topics of extreme values with nonlinear time series models and tail dependence measures and thank Professor Jun Shao for giving us the opportunity to discuss this work.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Wen Xu

Wen Xu is a Phd candidate of Fudan University.

Huixia Judy Wang

Huixia Judy Wang is a Professor of The George Washington University.

References

Cai, T. T., Liu, W., & Xia, Y. (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. Journal of the American Statistical Association, 108, 265–277. https://doi.org/10.1080/01621459.2012.758041
Web of Science ®Google Scholar
Cai, T. T., Liu, W., & Xia, Y. (2014). Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society, Series B, 76, 349–372. https://doi.org/10.1111/rssb.2014.76.issue-2
Google Scholar
He, X. (1997). Quantile curves without crossing. The American Statistician, 51(2), 186–192. https://doi.org/10.1080/00031305.1997.10473959
Web of Science ®Google Scholar
Jiang, T. (2004). The asymptotic distributions of the largest entries of sample correlation matrices. The Annals of Applied Probability, 14(2), 865–880. https://doi.org/10.1214/105051604000000143
Web of Science ®Google Scholar
Keilegom, I. V., & Wang, L. (2010). Semiparametric modeling and estimation of heteroscedasticity in regression analysis of cross-sectional data. Electronic Journal of Statistics, 4, 186–192. https://doi.org/10.1214/09-EJS547
Web of Science ®Google Scholar
Li, G., Li, Y., & Tsai, C. (2015). Quantile correlations and quantile autoregressive modeling. Journal of the American Statistical Association, 110(509), 246–261. https://doi.org/10.1080/01621459.2014.892007
Web of Science ®Google Scholar
Pang, L., Lu, W., & Wang, H. (2015). Local Buckley-James estimator for the heteroscedastic accelerated failure time model. Statistica Sinica, 25, 863–877. https://doi.org/10.5705/ss.2013.313
Web of Science ®Google Scholar
Tang, Y., & Pan, Q. (2020). Conditional marginal test for high dimensional quantile regression. Statistica Sinica. to appear.
Google Scholar
Wang, H., Li, D., & He, X. (2012). Estimation of high conditional quantiles for heavy-tailed distributions. Journal of the American Statistical Association, 107, 1453–1464. https://doi.org/10.1080/01621459.2012.716382
Web of Science ®Google Scholar
Wu, C., Xu, G., & Pan, W. (2019). An adaptive test on high-dimensional parameters in generalized linear models. Statistica Sinica, 28, 1226–1255. https://doi.org/10.5705/ss.202017.0354
Google Scholar
Xiao, H., & Wu, W. (2013). Asymptotic theory for maximum deviations of sample covariance matrix estimates. Stoch Process and Their Applications, 123(7), 2899–2920. https://doi.org/10.1016/j.spa.2013.03.012
Web of Science ®Google Scholar
Xu, W., Li, D., & Wang, H. (2020). Extreme quantile estimation based on the tail single-index model. Statistica Sinica. https://doi.org/10.5705/ss.202020.0051
Google Scholar
Xu, G., Lin, L., Wei, P., & Pan, W. (2016). An adaptive two-sample test for high-dimensional means. Biometrika, 103(3), 609–624. https://doi.org/10.1093/biomet/asw029
PubMed Web of Science ®Google Scholar
Zhang, Z., Zhang, C., & Cui, Q. (2017). Random threshold driven tail dependence measures with application to precipitation data analysis. Statistica Sinica, 27(2), 421–453. https://doi.rog/10.5705/ss.202014.0421
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Discussion on “on studying extreme values and systematic risks with nonlinear time series models and tail dependence measures”

1. Partial tail dependence

2. Maxima of maxima for high-dimensional inference

Acknowledgments

Disclosure statement

Notes on contributors

Wen Xu

Huixia Judy Wang

References

Information for

Open access

Opportunities

Help and information

Discussion on “on studying extreme values and systematic risks with nonlinear time series models and tail dependence measures”

1. Partial tail dependence

2. Maxima of maxima for high-dimensional inference

Acknowledgments

Disclosure statement

Additional information

Notes on contributors

Wen Xu

Huixia Judy Wang

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date