1,643

Views

CrossRef citations to date

Altmetric

Review Article

On studying extreme values and systematic risks with nonlinear time series models and tail dependence measures

Zhengjun ZhangDepartment of Statistics, University of Wisconsin-Madison, Madison, WI, USACorrespondence[email protected]
View further author information

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

This review paper discusses advances of statistical inference in modeling extreme observations from multiple sources and heterogeneous populations. The paper starts briefly reviewing classical univariate/multivariate extreme value theory, tail equivalence, and tail (in)dependence. New extreme value theory for heterogeneous populations is then introduced. Time series models for maxima and extreme observations are the focus of the review. These models naturally form a new system with similar structures. They can be used as alternatives to the widely used ARMA models and GARCH models. Applications of these time series models can be in many fields. The paper discusses two important applications: systematic risks and extreme co-movements/large scale contagions.

Keywords:

1. Introduction

Extreme value theory and methods are commonly applied in many research fields, e.g., finance, insurance, health, climate, and environmental studies. Vast applications can be found in Mikosch et al. (Citation1997), Embrechts et al. (Citation1999), McNeil and Frey (Citation2000), S. Coles et al. (Citation2001), Finkenstädt and Rootzén (Citation2004), Castillo et al. (Citation2005), Salvadori et al. (Citation2007), Dey and Yan (Citation2016), amongst many excellent books. On the theoretical side, Galambos (Citation1987), Leadbetter et al. (Citation1983), Resnick (Citation1987), de Haan (Citation1993), Beirlant et al. (Citation2006) and de Haan and Ferreira (Citation2007) contain many rigorous and fundamental results. In the statistical inference of maximum likelihood estimation of parameters from the extreme value distributions, there have been quite many developments, e.g., Smith (Citation1985), Drees et al. (Citation2004), Zhou (Citation2008), Bücher and Segers (Citation2017) amongst others. Besides the maximum likelihood estimation, other inference methods, e.g., probability weighted moments, generalised method of moments, have also been developed, which are not detailed here.

In the era of big data, the classical extreme value theory finds its limitations in fitting data generated from multiple sources with complex structures. To attack new challenging problems in extreme value studies, many new methodologies, new models, and new theories have also been developed. Here are some examples. Zhang and Smith (Citation2010) proposed the multivariate maxima of moving maxima (M4) processes and applied the method to model jumps in returns in multivariate financial time series and predicted the extreme co-movements in price returns. Meinguet (Citation2012) studied maxima of moving maxima of continuous functions. Martins and Ferreira (Citation2014) studied the extremal properties of M4 models. Ferreira and Ferreira (Citation2018) constructed estimators for the extremal index through local dependence. Reich and Shaby (Citation2019) proposed a spatial Markov model for climate extremes. Pereira and Fonseca (Citation2019) studied statistical methods for assessing the contagion of spatial extreme events among regions. Deng and Zhang (Citation2018, Citation2020) studied haze extremes in a vast region in China.

There are many other advances in new theory, methodology, and applications, which are not listed in this review paper. The focus of this review paper is on time series models for maxima and extreme observations and tail dependence modeling. The time series models include moving maxima models in Sections 4.1, 4.3, 4.5, 4.6, max-autoregressive models in Section 4.2, and autoregressive conditional Fréchet models in Section 4.8. The paper also briefly discusses the most recently introduced probability foundations for these advanced statistical models in Section 3. In studying high dimensional extremes and extreme clusters in time series, the core is how to measure tail dependence between random variables. Section 3 is also discussing some of the proposed tail dependence measures in the literature. For completeness, Section 2 briefly reviews classical extreme value theory. Section 5 presents two data examples. Section 6 concludes.

2. Classical extreme value theory: brief review

In this section, we briefly review some fundamental properties in classical extreme value theory. There have been many developments in the field. Many results cannot be discussed in this review section, and readers are referred to the references included and beyond.

2.1. Univariate extreme value theory

2.1.1. Independent sequence

Suppose ${X_{1}, X_{2}, \dots, X_{n}}$ is a sequence of independent and identically distributed (i.i.d.) random variables with the distribution function $F (x)$ and let (1) $M_{n} = max (X_{1}, X_{2}, \dots, X_{n}) .$ (1) Then $M_{n}$ has the distribution function (2) $P (M_{n} \leq x) = P (X_{1} \leq x, \dots, X_{n} \leq x) = F^{n} (x) .$ (2) It is clear that the maximum of a sample simply tends to the right endpoint of the distribution support almost surely, no matter whether it is finite or infinite. Throughout the paper, we denote the right endpoint as $x_{F} = sup {x \in R : F (x) < 1}$ for the distribution function F and similarly for other distribution functions. What we are interested in is the limit form: (3) $\begin{aligned} lim_{n \to \infty} F^{n} (a_{n} x + b_{n}) & = lim_{n \to \infty} P (\frac{M_{n} - b_{n}}{a_{n}} \leq x) \\ = H (x) \end{aligned}$ (3) for suitable norming constants $a_{n} > 0$ and $b_{n} \in R$ .

If (Equation3(3) $\begin{aligned} lim_{n \to \infty} F^{n} (a_{n} x + b_{n}) & = lim_{n \to \infty} P (\frac{M_{n} - b_{n}}{a_{n}} \leq x) \\ = H (x) \end{aligned}$ (3) ) holds, we say F (or X) belongs to the (maximum) domain of attraction of H and write $F \in M D A (H)$ (or $X \in M D A (H)$ ). H has one of the following three parametric forms (which are generally called extreme value distributions): $\begin{array}{ll} Type I: & H (x) = \exp {- \exp (- x)} \\ (- \infty < x < \infty), \\ Type II: & H (x) = \{\begin{cases} 0 & if x \leq 0, \\ \exp (- x^{- α}) & if x > 0, \end{cases} \\ Type III: & H (x) = \{\begin{cases} \exp (- (- x)^{α}) & if x < 0, \\ 1 & if x \geq 0. \end{cases} \end{array}$ In II and III, α is any positive number. The three types are also often called the Gumbel type, Fréchet type and Weibull type, respectively.

The following theorems are very useful in finding the $M D A (H)$ of F and the suitable norming constants. The proofs of the theorems can be found in Leadbetter et al. (Citation1983), Resnick (Citation1987), Galambos (Citation1987) etc.

Theorem 2.1

Let $0 \leq τ \leq \infty$ and suppose that for suitable norming constants $a_{n} > 0$ and $b_{n} \in R$ , $u_{n} = u_{n} (x) = a_{n} x + b_{n}$ such that (4) $\begin{aligned} n (1 - F (u_{n})) \to τ as n \to \infty, \end{aligned}$ (4) then (5) $\begin{aligned} P (M_{n} \leq u_{n}) \to e^{- τ} as n \to \infty . \end{aligned}$ (5) Conversely, if (Equation5(5) $\begin{aligned} P (M_{n} \leq u_{n}) \to e^{- τ} as n \to \infty . \end{aligned}$ (5) ) holds for some τ, $0 \leq τ \leq \infty$ , then (Equation4(4) $\begin{aligned} n (1 - F (u_{n})) \to τ as n \to \infty, \end{aligned}$ (4) ) holds.

Theorem 2.2

Necessary and sufficient conditions for the distribution F belongs to the MDA of

Type I: $\int_{0}^{\infty} (1 - F (u)) d u < \infty$ , $lim_{t ↑ x_{F}} \frac{1 - F (t + x g (t))}{1 - F (t)} = e^{- x}$ for all real x, where $g (t) = \frac{\int_{t}^{x_{F}} (1 - F (u)) d u}{1 - F (t)}$ for $t < x_{F}$ .
Type II: $x_{F} = \infty$ and $lim_{t \to \infty} \frac{1 - F (t x)}{1 - F (t)} = x^{- α}$ $α > 0$ , for each x>0.
Type III: $x_{F} < \infty$ and $lim_{h ↓ 0} \frac{1 - F (x_{F} - x h)}{1 - F (x_{F} - h)} = x^{α}$ $α > 0$ , for each x>0.

For illustrative purpose, let's consider the Pareto distribution $F (x) = 1 - κ x^{- α}, α > 0, κ > 0, x \geq κ^{1 / α} .$ We have $\frac{1 - F (t x)}{1 - F (t)} = \frac{(t x)^{- α}}{t^{- α}} = x^{- α},$ so F belongs to MDA of a Type II extreme value distribution. By setting $n (1 - F (u_{n})) = τ,$ we have $u_{n} = (κ n / τ)^{1 / α} .$ By putting $τ = x^{- α}$ for $x \geq 0$ , we have $P ((κ n)^{- 1 / α} M_{n} \leq x) \to \exp (- x^{- α}),$ so $a_{n} = (κ n)^{1 / α}, b_{n} = 0.$ The extreme value distributions are max-stable distributions. We say a non-degenerate distribution H is max-stable, if $H^{n} (a_{n} x + b_{n}) = H (x)$ holds for some constants $a_{n} > 0$ and $b_{n} \in R$ for each $n = 2, 3, \dots$ . The next result (Theorem 1.4.1 in Leadbetter et al., Citation1983) shows the relation.

Theorem 2.3

Every max-stable distribution is of extreme value type, i.e., equal to $H (a x + b)$ for some a>0 and $b \in R$ ; Conversely, each distribution of extreme value type is max-stable.

The three types of extreme value distributions can be represented by a generalised extreme value (GEV) distribution form (which is very useful for statistical purposes): (6) $\begin{aligned} H (x; μ, σ, ξ) = \exp (- {[1 + \frac{ξ (x - μ)}{σ}]}^{- 1 / ξ}), \end{aligned}$ (6) where $1 + ξ (x - μ) / σ > 0$ , $σ > 0$ and $μ, ξ$ are arbitrary. The case $ξ = 0$ is interpreted as the limit $ξ \to 0$ , that is (7) $\begin{aligned} H (x; μ, σ, 0) = \exp (- \exp [- \frac{(x - μ)}{σ}]) . \end{aligned}$ (7) Types II and III correspond to $ξ > 0 (ξ = \frac{1}{α})$ and $ξ < 0 (ξ = - \frac{1}{α})$ respectively. Smith (Citation1990) has a detailed review of statistical treatments, applications and estimations, of the GEV.

2.1.2. Stationary sequence

Suppose now ${X_{i}, i = 1, 2, \dots,}$ is a stationary sequence with a continuous marginal distribution function $F (x)$ and ${{\hat{X}}_{i}, i = 1, 2, \dots,}$ is the so-called associated sequence of i.i.d. random variables with the same marginal distribution function F. $M_{n}$ stands for the maximum as usual, defined by (Equation1(1) $M_{n} = max (X_{1}, X_{2}, \dots, X_{n}) .$ (1) ), while ${\hat{M}}_{n}$ denotes the corresponding maximum of ${{\hat{X}}_{1}, \dots, {\hat{X}}_{n}}$ . The limit distribution of $M_{n}$ can be related to the limit distribution of ${\hat{M}}_{n}$ via a quantity θ defined below.

If for every $τ > 0$ there exists a sequence of thresholds ${u_{n}}$ such that (8) $\begin{aligned} P ({\hat{M}}_{n} \leq u_{n}) \to e^{- τ}, \end{aligned}$ (8) and under quite mild additional conditions, (9) $\begin{aligned} P (M_{n} \leq u_{n}) \to e^{- θ τ} . \end{aligned}$ (9) Then θ is called the extremal index of the sequence ${X_{n}}$ . This concept originated in papers by Cartwright (Citation1958), Newell (Citation1964), Loynes (Citation1965) and O'Brien (Citation1974). Leadbetter (Citation1983) gave a formal definition.

The index θ can take any values in [0,1] and $\frac{1}{θ}$ is interpreted as the mean cluster size of exceedance over some high threshold. When $θ = 0$ , it corresponds to a strong dependence (infinite cluster sizes) but not so strong that all the values can be the same. While $θ = 1$ is a form of asymptotic independence of extremes, but it does not mean that the original sequence is independent.

If (Equation9(9) $\begin{aligned} P (M_{n} \leq u_{n}) \to e^{- θ τ} . \end{aligned}$ (9) ) holds for some τ and corresponding ${u_{n}}$ , then it holds for all $τ^{'}$ (equal or not equal to τ) and its corresponding ${u_{n}^{'}}$ . Estimators of the extremal index have been proposed by Leadbetter et al. (Citation1989), Nandagopalan (Citation1990) and Hsing (Citation1993). Smith and Weissman (Citation1994) gave a review of estimating the extreme index and proposed two estimating methods, i.e., blocks method and runs method. Other references include Chapter 8 in the book by Embrechts et al. (Citation1997).

2.2. Multivariate extreme value theory

2.2.1. Independent sequence

Suppose ${X_{i} = (X_{i 1}, \dots, X_{i D}), i = 1, 2, \dots}$ is a D-dimensional i.i.d. random process with distribution $F (x) = F (x_{1}, \dots, x_{D}) = P (X_{i d} \leq x_{d}, d = 1, \dots, D)$ and marginal distributions $F_{d} (x) = P (X_{i d} \leq x_{d}), d = 1, \dots, D$ . Let $M_{n} = (M_{n 1}, \dots, M_{n D})$ denote the vector of pointwise maxima, where $M_{n d} = max {X_{i d}, 1 \leq i \leq n}$ . If there exist norming constants $a_{n} > 0$ and $b_{n} \in R^{D}$ such that (10) $\begin{aligned} P (M_{n} \leq a_{n} x + b_{n}) \\ = P (M_{n d} \leq a_{n d} x_{d} + b_{n d}, d = 1, \dots, D) \\ = F^{n} (a_{n 1} x_{1} + b_{n 1}, a_{n 2} x_{2} + b_{n 2}, \dots, a_{n D} x_{D} + b_{n D}) \\ = F^{n} (a_{n} x + b_{n}) \to H (x) \end{aligned}$ (10) as $n \to \infty$ and for the limit distribution H being non-degenerate such that each $H_{i}, i = 1, \dots, D$ , is non-degenerate and must be in the GEV family, then the distribution H is called a D-dimensional multivariate extreme value distribution, and F is said to belong to the domain of attraction of H, which we write $F \in M D A (H)$ .

These distributions received theoretical consideration in works back to 1970s and 1980s by de Haan and Resnick (Citation1977), de Haan (Citation1985), Pickands (Citation1981) and Resnick (Citation1987). In the characterisation of the multivariate extreme value distribution, like the univariate case, max-stable (or min-stable) distributions play a central role. We say a distribution $H (x)$ is max-stable if for every t>0 there exist functions $α (t) > 0$ and $β (t) \in R^{D}$ such that (11) $\begin{aligned} H^{t} (x) & = H (α (t) x + β (t)) \\ = H (α_{1} (t) x_{1} + β_{1} (t), \dots, α_{D} (t) x_{D} + β_{D} (t)) . \end{aligned}$ (11) The following theorem describes the equivalence between multivariate extreme value distributions and max-stable distributions.

Theorem 2.4

The class of multivariate extreme value distributions is precisely the class of max-stable distribution functions with non-degenerate marginals.

This is Proposition 5.9 in Resnick (Citation1987). After slight modification of Pickands' representation of a min-stable multivariate exponential into a representation for a max-stable multivariate Fréchet distribution, we have

Theorem 2.5

Suppose $H (x)$ is a limit distribution satisfying (Equation10(10) $\begin{aligned} P (M_{n} \leq a_{n} x + b_{n}) \\ = P (M_{n d} \leq a_{n d} x_{d} + b_{n d}, d = 1, \dots, D) \\ = F^{n} (a_{n 1} x_{1} + b_{n 1}, a_{n 2} x_{2} + b_{n 2}, \dots, a_{n D} x_{D} + b_{n D}) \\ = F^{n} (a_{n} x + b_{n}) \to H (x) \end{aligned}$ (10) ), then (12) $\begin{aligned} H (x) = \exp \{- \int_{S_{D}} c (\frac{w_{i}}{x_{i}}) d G (w)\}, \end{aligned}$ (12) where G is a positive finite measure on the unit simplex $\begin{aligned} S_{D} & = \{(w_{1}, \dots, w_{D}) : \sum_{i = 1}^{D} w_{i} = 1, w_{i} \geq 0, \\ i = 1, \dots, D\}, \end{aligned}$ and G satisfies (13) $\begin{aligned} \int_{S_{D}} w_{i} d G (w) = 1, i = 1, \dots, D . \end{aligned}$ (13)

Note $v (x) = \int_{S_{D}} max_{1 \leq i \leq D} (\frac{w_{i}}{x_{i}}) d G (w)$ is called the exponent measure by de Haan and Resnick (Citation1977).

2.2.2. Stationary sequence

Some of the results for the univariate stationary sequences can be extended in the multivariate context. Suppose now ${X_{i} = (X_{i 1}, \dots, X_{i D}), i = 1, 2, \dots}$ is a D-dimensional stationary stochastic processes with distribution function F and marginals $F_{d}$ . Also let ${{\hat{X}}_{i}}$ be the associated sequence of i.i.d. random vectors having the same distribution function F. $M_{n}$ and ${\hat{M}}_{n}$ are both pointwise maxima of ${X_{i}}$ and ${{\hat{X}}_{i}}$ respectively. Suppose (14) $\begin{aligned} \begin{aligned} lim_{n \to \infty} P (M_{n 1} \leq u_{n 1}, \dots, M_{n D} \leq u_{n D}) & = H (τ), \\ lim_{n \to \infty} P ({\hat{M}}_{n 1} \leq u_{n 1}, \dots, {\hat{M}}_{n D} \leq u_{n D}) & = \hat{H} (τ) \end{aligned} \end{aligned}$ (14) both exist and are nonzero, then a quantity that (Nandagopalan, Citation1990, Citation1994) called the multivariate extremal index can relate the extreme value properties of a stationary process to those of i.i.d. sequence. The multivariate extremal index $θ (τ)$ is defined by (15) $\begin{aligned} H (τ) = \hat{H} (τ)^{θ (τ)} \end{aligned}$ (15) where $θ (τ)$ satisfies

$0 \leq θ (τ) \leq 1$ for all $τ$ ,
$θ (0, \dots, 0, τ_{d}, 0, \dots, 0) = θ_{d}$ for $τ_{d} > 0$ , where $θ_{d}$ is the extremal index of the $d^{t h}$ component process.
$θ (c τ) = θ (τ)$ for all c>0 (Theorem 1.1 of Nandagopalan, Citation1994).

Smith and Weissman (Citation1996) pointed out that these properties are not sufficient to characterise the function $θ (τ)$ . They also argued two reasons why one needs to obtain a more precise characterisation to cover a much broader range of processes and to correspond to real stochastic processes, for instance, multivariate maxima of moving maxima processes which will be reviewed next. The first reason is that ‘the number of examples for which the multivariate extreme index has been calculated is currently very small (Nandagopalan, Citation1994; Weissman, Citation1994) and it is important to be able to extend this class to cover a much broader range of processes’. The second reason is that ‘why we need a characterisation is statistical: crude estimators of $θ (τ)$ are easy to construct, but would not correspond to multivariate extreme index of any real stochastic process’.

2.2.3. The copula representations of multivariate extreme value distributions

In this subsection, we study some basic properties of multivariate extreme value (MEV) distribution functions. The following two lemmas are very general, not restricted to MEV, and they are Theorems 5.1.1 and 5.2.1 in Galambos (Citation1987).

Lemma 2.6

Let $F (x)$ be a D-dimensional distribution function with marginals $F_{d} (x),$ $1 \leq d \leq D$ . Then, for all $x_{1}, x_{2}, \dots, x_{D}$ , $\begin{aligned} max (0, \sum_{d = 1}^{D} F_{d} (x_{d}) - D + 1) \\ \leq F (x_{1}, x_{2}, \dots, x_{D}) \\ \leq min (F_{1} (x_{1}), F_{2} (x_{2}), \dots, F_{D} (x_{D})) . \end{aligned}$

Lemma 2.7

Let $F_{n} (x)$ be a sequence of D-dimensional distribution functions, $F_{n d} (x_{d})$ be the dth univariate marginal of $F_{n} (x)$ . If $F_{n} (x)$ converges weakly to a nondegenerate continuous distribution function $F (x)$ , then, for each d with $1 \leq d \leq D$ , $F_{n d} (x_{d})$ converges weakly to dth marginal $F_{d} (x_{d})$ of $F (x)$ .

The Copula, or dependence function, is a very useful concept in the investigation of limit distributions for normalised extremes. It is a multivariate distribution with all marginals being uniform $U (0, 1)$ .

Definition 2.8

Let $F (x)$ be a D-dimensional distribution function, with dth univariate margin $F_{d}$ . The copula associated with F, is a distribution function $C : [0, 1]^{D} \to [0, 1]$ that satisfies $F (x_{1}, x_{2}, \dots, x_{D}) = C [F_{1} (x_{1}), F_{2} (x_{2}), \dots, F_{D} (x_{D})] .$ Write $C_{F} = C_{F} (y) = C (y)$ over the unit cube $0 \leq y_{d} \leq 1, 1 \leq d \leq D$ .

Based on the function $C (y)$ , we now re-state theorems which connect the univariate marginals and the multivariate or dependence structure of the limit distributions.

Theorem 2.9

If (Equation10(10) $\begin{aligned} P (M_{n} \leq a_{n} x + b_{n}) \\ = P (M_{n d} \leq a_{n d} x_{d} + b_{n d}, d = 1, \dots, D) \\ = F^{n} (a_{n 1} x_{1} + b_{n 1}, a_{n 2} x_{2} + b_{n 2}, \dots, a_{n D} x_{D} + b_{n D}) \\ = F^{n} (a_{n} x + b_{n}) \to H (x) \end{aligned}$ (10) ) holds, then the dependence function $C_{H}$ of the limit $H (x)$ satisfies $C_{H}^{k} (y_{1}^{1 / k}, y_{2}^{1 / k}, \dots, y_{D}^{1 / k}) = C_{H} (y_{1}, y_{2}, \dots, y_{D})$ where $k \geq 1$ is an arbitrary integer. (This is Theorem 5.2.1 of Galambos, Citation1987).

Theorem 2.10

A D-dimensional distribution function $H (x)$ is a limit of (Equation10(10) $\begin{aligned} P (M_{n} \leq a_{n} x + b_{n}) \\ = P (M_{n d} \leq a_{n d} x_{d} + b_{n d}, d = 1, \dots, D) \\ = F^{n} (a_{n 1} x_{1} + b_{n 1}, a_{n 2} x_{2} + b_{n 2}, \dots, a_{n D} x_{D} + b_{n D}) \\ = F^{n} (a_{n} x + b_{n}) \to H (x) \end{aligned}$ (10) ) if and only if its univariate marginals are of the same type as one of three type distributions and its copula $C_{H}$ satisfies the condition of Theorem 2.9. (This is Theorem 5.2.4 of Galambos, Citation1987).

Theorem 2.10 tells in principle that if we want to determine $a_{n}$ and $b_{n}$ we just need to determine the components from the marginal limit convergence forms. Let's look at a simple example to illustrate how Theorem 2.10 works.

Example 2.1

Let $(X, Y)$ have a bivariate exponential distribution function $F (x, y)$ . If $\frac{M_{n} - b_{n}}{a_{n}}$ converges weakly to a nondegenerate distribution function $H (x, y)$ , we can choose $b_{n} = (\log n, \log n) and a_{n} = (1, 1) .$

For finding $H (x)$ functions, there are many copula dependence theories and examples in Joe (Citation2014); see also Zhang (Citation2009) for constructing extreme value copula, and Yang et al. (Citation2011) for a flexible MGB2 copula family. In Section 4, copulas will be embedded in time series models for extreme values and tail dependent observations.

3. Recent advances on tail (in)dependence and new extreme value theory

From Section 2.2, we can see that the limit multivariate extreme value distribution does not exist in a unified parametric form. To model a multivariate extreme value distribution function is in fact to model the measure function G in (Equation12(12) $\begin{aligned} H (x) = \exp \{- \int_{S_{D}} c (\frac{w_{i}}{x_{i}}) d G (w)\}, \end{aligned}$ (12) ). de Haan (Citation1985) gave a simple nonparametric procedure for modeling the measure function. S. G. Coles and Tawn (Citation1991) argued that parametric models are preferable when one wants to simultaneously estimate the exponent measure and the dependence structure.

In parametric modeling, identifying the dependence between two random variables in the tails determines how good is the chosen model. In the next section, we discuss the tail dependence, its probabilistic properties, and its statistical developments.

3.1. Tail equivalence and tail (in)dependence

Definition 3.1

Two identically distributed random variables X and Y with distribution function F are called tail independent, if (16) $\begin{aligned} λ = lim_{u \to x_{F}} P (Y > u ∣ X > u) \end{aligned}$ (16) is 0. The quantity λ, if exists, is called the bivariate tail dependence index; it quantifies the amount of dependence of the bivariate upper tails. If $λ > 0$ , X and Y are called tail dependent, and we say that there are extreme co-movements between X and Y in time series modeling and inference.

Besides the definition of tail (in)dependence, in the literature, the asymptotic (in)dependence, and the extremal (in)dependence have also been used. The asymptotic independence is more in mathematics, while the other two are more in applications. Sometimes, the upper tail dependence may also be regarded as the tail dependence. In many applications, they are used interchangeably. Sibuya (Citation1959) introduced the idea of asymptotic independence between two random variables with identical marginal distributions, and de Haan and Resnick (Citation1977) extended it to the multivariate case, see also S. Coles et al. (Citation1999). Examples of tail dependence indices of bivariate random variables were presented in Embrechts et al. (Citation2002). For instance, the tail dependence index of a bivariate normal (Gaussian) random vector is zero as long as the corresponding correlation coefficient is less than one; the tail dependence index of a bivariate t random vector with a positive correlation is greater than zero. Many financial analysts, for example Salmon (Citation2012), blamed a mathematical formula, the Gaussian copula, as the major cause of the 2007–2008 financial crisis mainly because Gaussian random variables are tail independent. This example indicates that tail (in)dependence modeling is of practical importance, see also Embrechts et al. (Citation2002) for properties and pitfalls of correlations and dependence measures. Zhang (Citation2005, Citation2008b) extended the definition of tail dependence between two random variables to lag-k tail dependence of a sequence of random variables with identical marginal distribution. The definition of lag-k tail dependence for a sequence of random variables is given below.

Definition 3.2

A sequence of sample ${X_{1}, X_{2}, \dots, X_{n}}$ is called lag-k tail dependent if (17) $\begin{aligned} λ_{k} = lim_{u \to x_{F}} P (X_{1} > u | X_{k + 1} > u) > 0, \\ lim_{u \to x_{F}} P (X_{1} > u | X_{k + j} > u) = 0, j > 1. \end{aligned}$ (17) Then $λ_{k}$ is called lag-k tail dependence index.

When $λ = 0$ , the joint limit distribution of bivariate maxima is the product of marginal limit distributions. The following Proposition 3.3 is from Proposition 5.27 in Resnick (Citation1987).

Proposition 3.3

Suppose ${X_{i} = (X_{i 1}, \dots, X_{i D}), i = 1, 2, \dots}$ is a D-dimensional i.i.d. random process with a common distribution F and a common marginal distribution $F_{d} (x) = F_{1} (x)$ for $d = 2, \dots, D$ . Let $M_{n} = (M_{n 1}, \dots, M_{n D})$ denote the vector of pointwise maxima, where $M_{n d} = max {X_{i d}, 1 \leq i \leq n}$ . Suppose $F_{1}$ is in the domain of attraction of some univariate extreme value distribution $G_{1} (x)$ , i.e., there exist $a_{n} > 0, b_{n} \in R$ such that $F_{1}^{n} (a_{n} x + b_{n}) \to G_{1} (x) .$ The following are equivalent.

F is in the domain of attraction of a product measure: $\begin{aligned} F^{n} (a_{n} x + b_{n} 1) \\ = P (M_{n} \leq a_{n} x + b_{n} 1) \to \prod_{i = 1}^{D} G_{1} (x_{i}) . \end{aligned}$
For all $1 \leq i < j \leq D$ $\begin{aligned} P (M_{n i} \leq a_{n} x_{i} + b_{n}, M_{n j} \leq a_{n} x_{j} + b_{n}) \\ \to G_{1} (x_{i}) G_{2} (x_{j}) . \end{aligned}$
For $x_{k}$ such that $G_{1} (x_{k}) > 0, 1 \leq k \leq D$ $lim_{n \to \infty} n P (X_{1 i} > a_{n} x_{i} + b_{n}, X_{1 j} > a_{n} x_{j} + b_{n}) = 0$ for all $1 \leq i < j \leq D$ .
With any $1 \leq i < j \leq D$ $lim_{t \to x_{F}} P (X_{i} > t, X_{j} > t) / P (X_{1} > t) = 0.$

From this proposition, we can see that identifying $λ = 0$ or not is a very important task as it concerns the final form of the limit distribution. When $λ = 0$ is confirmed, we just need to find the univariate limit, i.e., not the joint dependence structure.

In practice, dependent random variables are not necessarily tail dependent. It is thus of importance to check or test whether any two sequences of data are tail dependent or tail independent before choosing a certain class of models for the data. In statistical modeling of tail dependent variables, a significant step is due to (Ledford & Tawn, Citation1996, Citation1997). They introduced a class of models for tail dependence and near tail independence, and constructed test statistics for the null hypothesis of tail dependence using the coefficient of tail dependence (defined as η); see Heffernan (Citation2001) for a directory of coefficients of tail dependence. Peng (Citation1999) constructed a non-parametric estimator for the η and a test statistic of testing the hypothesis of tail dependence. Contrary to their null hypothesis, Zhang (Citation2008b) and Zhang et al. (Citation2017) introduced an empirically efficient test statistic for the null hypothesis of tail independence based on the tail quotient correlation coefficient (TQCC), where the underlying threshold can be a constant and/or a random variable that diverges to infinity. We note that the null and alternative hypotheses in Ledford and Tawn (Citation1996, Citation1997) are reversed in Zhang (Citation2008b), Hüsler and Li (Citation2009) and Zhang et al. (Citation2017). Next, we introduce the TQCC and its properties.

In the literature, Pearson's linear correlation coefficient ρ can be interpreted in thirteen ways (Rodgers & Nicewander, Citation1988). We now consider a new way of relating ρ to a simple form of variable decomposition.

Example 3.1

Suppose a bivariate random vector $(X, Y)$ can be expressed as $X = a_{1} Z_{1} + a_{2} Z_{2}, Y = b_{1} Z_{1} + b_{2} Z_{2},$ where $a_{1}^{2} + a_{2}^{2} = 1$ , $b_{1}^{2} + b_{2}^{2} = 1$ , $Z_{1}$ and $Z_{2}$ are independent standard normal random variables. Then $ρ = a_{1} b_{1} + a_{2} b_{2} .$

Analog to Example 3.1 of stable law of random variables, we construct an extreme value type example of max-stable law of random variables.

Example 3.2

Suppose a bivariate random vector $(X, Y)$ can be expressed as $X = max (a_{1} Z_{1}, a_{2} Z_{2}), Y = max (b_{1} Z_{1}, b_{2} Z_{2}),$ where $a_{1}, b_{1}, a_{2}, b_{2}$ are nonnegative satisfying $a_{1} + a_{2} = 1$ , $b_{1} + b_{2} = 1$ , $Z_{1}$ and $Z_{2}$ are independent unit Fréchet random variables with the distribution function $F (x) = \exp (- 1 / x)$ for x>0. Then $λ = min (a_{1} + b_{2}, a_{2} + b_{1}) .$

The sample based correlation coefficient $r_{n}$ of a sequence of bivariate observations $(X_{1}, Y_{1}), \dots, (X_{n}, Y_{n})$ with both $X_{i}$ and $Y_{i}$ having finite second moment (not necessarily normally distributed) can be expressed as an inner product of two normalised random vectors: (18) $\begin{aligned} r_{n} = 〈\frac{X - \bar{X} 1_{n}}{∥ X - \bar{X} 1_{n} ∥_{2}}, \frac{Y - \bar{Y} 1_{n}}{∥ Y - \bar{Y} 1_{n} ∥_{2}}〉, r_{n} \overset{P}{\to} ρ, \end{aligned}$ (18) where $X = (X_{1}, \dots, X_{n}), Y = (Y_{1}, \dots, Y_{n})$ , $\bar{X}$ and $\bar{Y}$ are the sample means of $X_{i}$ 's and $Y_{i}$ 's respectively. $1_{n}$ is a vector with all elements being 1.

Continue Example 3.2 and assume that a sequence of independent bivariate random variables $(X_{i}, Y_{i})$ can be decomposed as $X_{i} = max (a_{1} Z_{i 1}, a_{2} Z_{i 2}), Y_{i} = max (b_{1} Z_{i 1}, b_{2} Z_{i 2}),$ where $(Z_{i 1}, Z_{i 2}), i = 1, 2, \dots, n$ , are an independent array of unit Fréchet random variables. Then a quotient correlation coefficient is defined (19) $\begin{aligned} q_{n} & = \frac{max_{i \leq n} {Y_{i} / X_{i} - 1} + max_{i \leq n} {X_{i} / Y_{i} - 1}}{max_{i \leq n} {Y_{i} / X_{i}} \times max_{i \leq n} {X_{i} / Y_{i}} - 1}, \\ q_{n} \overset{P}{\to} λ . \end{aligned}$ (19) The quantities $max_{i \leq n} {Y_{i} / X_{i} - 1}$ and $max_{i \leq n} {X_{i} / Y_{i} - 1}$ in $q_{n}$ are asymptotically positive and are interpreted as the maximum relative errors of $X_{i}$ 's to $Y_{i}$ 's and $Y_{i}$ 's to $X_{i}$ 's, respectively.

Looking at (Equation18(18) $\begin{aligned} r_{n} = 〈\frac{X - \bar{X} 1_{n}}{∥ X - \bar{X} 1_{n} ∥_{2}}, \frac{Y - \bar{Y} 1_{n}}{∥ Y - \bar{Y} 1_{n} ∥_{2}}〉, r_{n} \overset{P}{\to} ρ, \end{aligned}$ (18) ), we can see that $| r_{n} |$ is associated to the absolute errors of $X_{i}$ 's to the center of $X_{i}$ 's and $Y_{i}$ 's to the center of $Y_{i}$ 's. Clearly $r_{n}$ and $q_{n}$ measure different variable dependencies. Zhang et al. (Citation2011) proved that $r_{n}$ and $q_{n}$ are asymptotically independent and demonstrated that a combination of them outperforms many popular test statistics of testing hypothesis of independence.

We note that the definition of $q_{n}$ requires $X_{i}$ and $Y_{i}$ are identically distributed as unit Fréchet. In fact, the definition can be extended to any positive random variables. In terms of the definition λ in (Equation16(16) $\begin{aligned} λ = lim_{u \to x_{F}} P (Y > u ∣ X > u) \end{aligned}$ (16) ), Heffernan et al. (Citation2007) showed that X and Y do not have to be identically distributed as long as they are tail equivalent in the sense of the following Lemma 3.4 which is Lemma 14 in Heffernan et al. (Citation2007).

Lemma 3.4

Suppose X and Y satisfy $P (X > x) / P (Y > x) \to 1$ as x tends to infinity. $Y^{'}$ is the marginally transformed random variable of Y, i.e., $Y^{'} = G (Y)$ for some increasing monotone function G; and $Y^{'}$ has the same distribution as X has. Then (20) $\begin{aligned} lim_{x \to \infty} P (Y > x | X > x) = lim_{x \to \infty} P (Y^{'} > x | X > x) \end{aligned}$ (20) as long as one of the above two limits exists.

Using the tail equivalence, the $q_{n}$ can be extended to tail quotient correlation coefficient (TQCC) (Zhang et al., Citation2017) defined next.

Definition 3.5

If ${(X_{i}, Y_{i})}_{i = 1}^{n}$ is a random sample of random variables being tail equivalent to unit Fréchet random variables $(X, Y)$ , (21) $\begin{aligned} q_{u_{n}} = \frac{\begin{array}{l} max_{1 \leq i \leq n} {\frac{max (X_{i}, u_{n})}{max (Y_{i}, u_{n})} - 1} \\ + max_{1 \leq i \leq n} {\frac{max (Y_{i}, u_{n})}{max (X_{i}, u_{n})} - 1} \end{array}}{\begin{array}{l} max_{1 \leq i \leq n} {\frac{max (X_{i}, u_{n})}{max (Y_{i}, u_{n})}} \\ \times max_{1 \leq i \leq n} {\frac{max (Y_{i}, u_{n})}{max (X_{i}, u_{n})}} - 1 \end{array}} \end{aligned}$ (21) is the tail quotient correlation coefficient $(T Q C C)$ where $u_{n}$ is varying thresholds that tend to infinity.

We present some theoretical results from Zhang et al. (Citation2017) related to the limit distribution of $q_{u_{n}}$ in cases of two random thresholds $u_{n}$ : $u_{n} = T_{n, t} \overset{P}{\to} \infty$ in Theorem 3.7; $u_{n} = u_{n}^{*} a_{n}$ with $u_{n}^{*} \overset{P}{\to} u^{*} \in (0, \infty)$ , $a_{n} \to \infty$ and $a_{n} / n \to 0$ as $n \to \infty$ in Theorem 3.8. The following assumption is needed.

Assumption T1: For $1 < t < 1 + δ, δ > 0$ , paired tail independent random variables $(X_{i}, Y_{i})$ satisfy $\begin{aligned} max_{1 \leq i \leq n} \frac{max (X_{i}, T_{n, t})}{max (Y_{i}, T_{n, t})} /max_{1 \leq i \leq n} \frac{max (X_{i}, T_{n, t})}{T_{n, t}} = 1 + o_{p} (1), \\ max_{1 \leq i \leq n} \frac{max (Y_{i}, T_{n, t})}{max (X_{i}, T_{n, t})} /max_{1 \leq i \leq n} \frac{max (Y_{i}, T_{n, t})}{T_{n, t}} = 1 + o_{p} (1) . \end{aligned}$

Remark 3.1

Assumption 3.1 is natural since the tail independence of $(X_{i}, Y_{i})$ (also $(max (X_{i}, T_{n, t}), max (Y_{i}, T_{n, t}))$ ) implies $max (X_{i}, T_{n, t}), i = 1, \dots, n$ and $max (Y_{i}, T_{n, t}), i = 1, \dots, n$ , will hug $T_{n, t}$ in each axis direction when the threshold value $T_{n, t}$ is sufficiently large.

The following proposition is Proposition 2 in Zhang et al. (Citation2017).

Proposition 3.6

If $P (X_{i} > u) / P (X_{i} > u, Y_{i} > u) \sim L (u) u^{- 1 + 1 / η}, η \in (0, 1]$ , where $L (u)$ is a slowly varying function, as defined in Ledford and Tawn (Citation1997), then T1 holds for $t η < 1$ when $η < 1$ , T1 does not hold when $η = 1$ .

Theorem 3.7

Suppose for given t>1, all random variables $X_{1}^{'}, \dots, X_{n}^{'}$ , $Y_{1}^{'}, \dots,$ $Y_{n}^{'}$ , and $T_{n, t}$ are independent, where $X_{i}^{'}$ and $Y_{i}^{'}$ are unit Fréchet random variables, and $T_{n, t}$ has the distribution function $\exp (- n / x^{t})$ for x>0. If $A_{n, t} = n {1 - \exp (- 1 / T_{n, t})}$ , then

for z>0, $\begin{aligned} lim_{n \to \infty} P (A_{n, t}^{- 1} max_{1 \leq i \leq n} \frac{max (X_{i}^{'}, T_{n, t})}{max (Y_{i}^{'}, T_{n, t})} \leq z) \\ = \exp (- 1 / z); \end{aligned}$ for $z_{1} > 0$ and $z_{2} > 0$ , $\begin{aligned} lim_{n \to \infty} P (A_{n, t}^{- 1} max_{1 \leq i \leq n} \frac{max (X_{i}^{'}, T_{n, t})}{max (Y_{i}^{'}, T_{n, t})} \leq z_{1}, \\ A_{n, t}^{- 1} max_{1 \leq i \leq n} \frac{max (Y_{i}^{'}, T_{n, t})}{max (X_{i}^{'}, T_{n, t})} \leq z_{2}) \\ = \exp (- \frac{1}{z_{1}}) \exp (- \frac{1}{z_{2}}) . \end{aligned}$
Further, (22) $\begin{aligned} 2 n {1 - \exp (- 1 / T_{n, t})} q_{_{T_{n, t}}}^{'} \overset{L}{⟶} χ_{4}^{2}, \end{aligned}$ (22) where $χ_{4}^{2}$ is a chi-squared random variable with four degrees of freedom; $q_{_{T_{n, t}}}^{'}$ is defined as $q_{_{u_{n}}}$ by replacing $u_{n}$ by $T_{n, t}$ , $X_{i}$ by $X_{i}^{'}$ , and $Y_{i}$ by $Y_{i}^{'}$ , $i = 1, \dots, n$ respectively.

Theorem 3.8

Suppose ${X_{1}^{'}, \dots, X_{n}^{'}, Y_{1}^{'}, \dots, Y_{n}^{'}}$ are independent unit Fréchet random variables and $u_{n} = u_{n}^{*} a_{n}$ satisfies $u_{n}^{*} \overset{P}{\to} u^{*}$ , $a_{n} \to \infty$ , and $a_{n} / n \to 0$ as $n \to \infty$ , where $u^{*} \in (0, \infty)$ is a constant. Then $2 n {1 - \exp (- 1 / u_{n})} q_{_{u_{n}}}^{'} \overset{L}{⟶} χ_{4}^{2}$ .

Theorems 3.7 and 3.8 are Theorems 3 and 4 in Zhang et al. (Citation2017). $T_{n, t}$ in Zhang et al. (Citation2017) is chosen to be a high threshold of the observed and transformed sequence. A practical rank transformation method of transforming $X_{i}$ 's to unit Fréchet was proposed in Zhang et al. (Citation2011) where the transformation is based on a simulation idea. We will apply this rank transformation in our data section.

In dealing with tail dependence, clearly $q_{u_{n}}$ and $q_{T_{n, t}}$ have the simplest explicit formulas compared with other measures that are implicitly specified. They hold very simple interpretability. Their computability is straightforward. They also hold stability as their limits converge to their corresponding population quantities in (Equation19(19) $\begin{aligned} q_{n} & = \frac{max_{i \leq n} {Y_{i} / X_{i} - 1} + max_{i \leq n} {X_{i} / Y_{i} - 1}}{max_{i \leq n} {Y_{i} / X_{i}} \times max_{i \leq n} {X_{i} / Y_{i}} - 1}, \\ q_{n} \overset{P}{\to} λ . \end{aligned}$ (19) ). It's hardly finding any other sample based tail measures to share all of these properties. TQCC has been successfully applied to studies in financial risk contagions, precipitation extremes, haze extremes, and medical studies. In this paper, we further illustrate its usages in describing extreme-comovement and market contagions in Section 5.

3.2. New extreme value theory for heterogeneous populations

In the era of big data, data generated from multiple sources meet in a common place (cloud). Certainly, the data from each individual source has its own data generating process, i.e., a probability distribution. As such, classical extreme value theory reviewed in Section 2 cannot meet the need of big data extremes.

Considering the daily risk of high-frequency trading in a stock market, one can partition the data into hourly data (from 9:00am to 4:00pm). Suppose each hourly maxima $M_{j, n_{j}}$ of negative returns can be approximately modeled by an extreme value distribution $H_{j} (x)$ . It is clear that $M_{n}$ is better modeled by a function of $H_{j} (x), j = 1, \dots, 7,$ i.e., not a single $H_{j} (x)$ . We use the following simple example with k = 2 to illustrate the idea.

Example 3.3

The sequence ${X_{i}}_{i = 1}^{n}$ is generated by $X_{i} = max (Y_{i}, Z_{i})$ , where ${Y_{i}}_{i = 1}^{n} \overset{i . i . d .}{\sim} F_{1} (x)$ , ${Z_{i}}_{i = 1}^{n} \overset{i . i . d .}{\sim} F_{2} (x)$ , $Y_{i}$ and $Z_{i}$ are independent, and $F_{1} (x)$ and $F_{2} (x)$ are two corresponding distribution functions. Then ${X_{i}}_{i = 1}^{n} \overset{i . i . d .}{\sim} F (x) = F_{1} (x) F_{2} (x)$ .

Remark 3.2

The form $X_{i} = max (Y_{i}, Z_{i})$ is the simplest case in the general mixture models introduced in Zhao and Zhang (Citation2018). It is also the simplest case in the copula structured M4 models studied by Zhang and Zhu (Citation2016).

Figure presents Euro dollar against US dollar exchange rate negative return hourly maxima boxplots calculated from 1-minute returns and 5-minute returns in 24 1-hour intervals (h0 - (12:00 AM- 1:00 AM), h1 - (1:00 - 2:00 AM), …, h23 (11:00 PM - 11:59 PM)) from 01/01/2003 - 12/31/2018. Clearly, the trading behaviors in different time intervals are different. The daily maxima can fall in any of those 24 hourly intervals. As a result, the daily maxima is a mixture of hourly maxima. Motivated from this kind of observations, Cao and Zhang (Citation2020) developed new extreme value theory for maxima of maxima.

Figure 1. Euro dollar against US dollar exchange rate hourly maxima of 1 min (left panel) and 5 min (right panel) negative returns. The x-tickers are (h0 - (12:00 AM- 1:00 AM), h1 - (1:00 - 2:00 AM), …, h23 (11:00 PM - 11:59 PM) from left to right, respectively.

Suppose that the mixed sequence ${X_{i}}_{i = 1}^{n}$ is composed of k subsequences ${X_{j, i}}_{i = 1}^{n_{j}}, j = 1, 2, \dots, k$ , $n_{j} \to \infty$ as $n \to \infty$ and $n = n_{1} + \dots + n_{k}$ . Denote $M_{j, n_{j}} = max (X_{j, i}, i = 1, \dots, n_{j})$ as the maximum of each subsequence, ${X_{j, i}}_{i = 1}^{n_{j}} \overset{i . i . d .}{\sim} F_{j} (x), j = 1, 2, \dots, k$ . Suppose $F_{j} \in M D A (H_{j})$ , i.e., $M_{j, n_{j}}$ has the following limit distribution with some norming constants $a_{j, n_{j}} > 0, b_{j, n_{j}} \in R$ , (23) $\begin{aligned} lim_{n \to \infty} P (a_{j, n_{j}}^{- 1} (M_{j, n_{j}} - b_{j, n_{j}}) \leq x) = H_{j} (x) . \end{aligned}$ (23) Define $M_{n} = max (M_{1, n_{1}}, M_{2, n_{2}}, \dots, M_{k, n_{k}})$ . Questions can be asked: (1) whether or not (Equation3(3) $\begin{aligned} lim_{n \to \infty} F^{n} (a_{n} x + b_{n}) & = lim_{n \to \infty} P (\frac{M_{n} - b_{n}}{a_{n}} \leq x) \\ = H (x) \end{aligned}$ (3) ) holds with appropriately chosen norming constants $a_{n} > 0, b_{n} \in R$ ; (2) if (1) holds, whether or not $a_{n} > 0, b_{n} \in R$ are equivalent to any of $a_{j, n_{j}} > 0, b_{j, n_{j}} \in R$ ; (3) whether or not $H (x)$ is a function of $H_{j} (x)$ ; (4) if all (1)–(3) hold, which one is the best method to be used in practice. We include some new results from Cao and Zhang (Citation2020) in the next.

Theorem 3.9

If $M_{1, n_{1}}$ and $M_{2, n_{2}}$ satisfy (Equation23(23) $\begin{aligned} lim_{n \to \infty} P (a_{j, n_{j}}^{- 1} (M_{j, n_{j}} - b_{j, n_{j}}) \leq x) = H_{j} (x) . \end{aligned}$ (23) ) for j = 1, 2, the limit distribution of $M_{n}$ as $n \to \infty$ can be determined in the following cases:

Case 1. If $\frac{a_{2, n_{1}}}{a_{1, n_{2}}} \to a > 0$ , $a_{1, n_{1}}^{- 1} (b_{2, n_{2}} - b_{1, n_{1}}) \to b < + \infty$ , for some constants a and b, then (24) $\begin{aligned} P (a_{2, n_{2}}^{- 1} (M_{n} - b_{2, n_{2}}) \leq x) \to H_{1} (a x + b) H_{2} (x) . \end{aligned}$ (24)
Case 2. If $\frac{a_{2, n_{1}}}{a_{1, n_{2}}} \to 0$ , $a_{1, n_{1}}^{- 1} (b_{2, n_{2}} - b_{1, n_{1}}) \to + \infty$ then (25) $\begin{aligned} P (a_{2, n_{2}}^{- 1} (M_{n} - b_{2, n_{2}}) \leq x) \to H_{2} (x) . \end{aligned}$ (25)

Definition 3.10

For the independent sequence ${X_{i}}_{i = 1}^{n}$ with two subsequences ${X_{1, i}}_{i = 1}^{n_{1}}$ and ${X_{2, i}}_{i = 2}^{n_{2}}$ defined as above, suppose (Equation23(23) $\begin{aligned} lim_{n \to \infty} P (a_{j, n_{j}}^{- 1} (M_{j, n_{j}} - b_{j, n_{j}}) \leq x) = H_{j} (x) . \end{aligned}$ (23) ) is satisfied with j = 1, 2 and norming constants $a_{j, n_{j}} > 0, b_{j, n_{j}} \in R$ , i.e., (26) $\begin{aligned} lim_{n \to \infty} P (a_{j, n_{j}}^{- 1} (M_{j, n_{j}} - b_{j, n}) \leq x) = H_{j} (x), j = 1, 2, \end{aligned}$ (26) and (27) $\begin{aligned} P (a_{1, n_{1}}^{- 1} (M_{1, n_{1}} - b_{1, n_{1}}) \leq x, a_{2, n_{2}}^{- 1} (M_{2, n_{2}} - b_{2, n_{2}}) \leq x) \\ \to H (x) = H_{1} (x) H_{2} (x) . \end{aligned}$ (27) Then we call $H (x) = H_{1} (x) H_{2} (x)$ the accelerated max-stable distribution, which is the product of two max-stable distributions.

Since $H_{1} (x)$ and $H_{2} (x)$ are max-stable distributions, for any $n_{1} = 2, 3, \dots$ and $n_{2} = 2, 3, \dots$ , there are constants $a_{1, n_{1}} > 0$ , $b_{1, n_{1}} \in R$ , $a_{2, n_{2}} > 0$ , $b_{2, n_{2}} \in R$ such that $H_{1} (x) H_{2} (x) = H_{1}^{n_{1}} (a_{1, n} x + b_{1, n}) H_{2}^{n_{2}} (a_{2, n} x + b_{2, n})$ .

In equation (Equation27(27) $\begin{aligned} P (a_{1, n_{1}}^{- 1} (M_{1, n_{1}} - b_{1, n_{1}}) \leq x, a_{2, n_{2}}^{- 1} (M_{2, n_{2}} - b_{2, n_{2}}) \leq x) \\ \to H (x) = H_{1} (x) H_{2} (x) . \end{aligned}$ (27) ), we considered the convergence of $P (max (a_{1, n_{1}}^{- 1} (M_{1, n_{1}} - b_{1, n_{1}}), a_{2, n_{2}}^{- 1} (M_{2, n_{2}} - b_{2, n_{2}})) \leq x),$ instead of the traditional $P (a_{n}^{- 1} (M_{n} - b_{n}) \leq x)$ . If $n_{1}$ and $n_{2}$ are sufficiently large, by (Equation26(26) $\begin{aligned} lim_{n \to \infty} P (a_{j, n_{j}}^{- 1} (M_{j, n_{j}} - b_{j, n}) \leq x) = H_{j} (x), j = 1, 2, \end{aligned}$ (26) ) we have $P (a_{1, n_{1}}^{- 1} (M_{1, n_{1}} - b_{1, n_{1}}) \leq x) \approx G_{1} (x)$ and $P (a_{2, n_{2}}^{- 1} (M_{2, n_{2}} - b_{2, n_{2}}) \leq x) \approx G_{2} (x)$ , then (28) $\begin{aligned} P (M_{n} \leq x) & = P (max (M_{1, n_{1}}, M_{2, n_{2}}) \leq x) \\ = P (M_{1, n_{1}} \leq x) P (M_{2, n_{2}} \leq x) \\ \approx G_{1} (a_{1, n_{1}} x + b_{1, n_{1}}) G_{2} (a_{2, n_{2}} x + b_{2, n_{2}}) \\ = G_{1}^{*} (x) G_{2}^{*} (x) \end{aligned}$ (28) where $G_{j}^{*}$ is of the same type as $G_{j}$ , j = 1, 2.

Theorem 3.11

Suppose ${X_{i}}_{i = 1}^{n}$ is an independent sequence which is mixed with two subsequences ${X_{1, i}}_{i = 1}^{n_{1}}$ and ${X_{2, i}}_{i = 1}^{n_{2}}$ with underlying distributions $F_{1} (x)$ and $F_{2} (x)$ , $n_{1} \to \infty$ and $n_{2} \to \infty$ as $n \to \infty$ . Let $0 \leq τ < \infty$ and ${u_{1, i}}_{i = 1}^{n_{1}}$ and ${u_{2, i}}_{i = 1}^{n_{2}}$ are two sequences of real numbers such that (29) $\begin{aligned} n_{1} (1 - F_{1} (u_{1, n_{1}})) + n_{2} (1 - F_{2} (u_{2, n_{2}})) \to τ \\ as n \to \infty . \end{aligned}$ (29) Then (30) $\begin{aligned} P (M_{1, n_{1}} \leq u_{1, n_{1}}, M_{2, n_{2}} \leq u_{2, n_{2}}) \to e^{- τ} as n \to \infty . \end{aligned}$ (30) Conversely, if (Equation30(30) $\begin{aligned} P (M_{1, n_{1}} \leq u_{1, n_{1}}, M_{2, n_{2}} \leq u_{2, n_{2}}) \to e^{- τ} as n \to \infty . \end{aligned}$ (30) ) holds for some $0 \leq τ < \infty$ , then so does (Equation29(29) $\begin{aligned} n_{1} (1 - F_{1} (u_{1, n_{1}})) + n_{2} (1 - F_{2} (u_{2, n_{2}})) \to τ \\ as n \to \infty . \end{aligned}$ (29) ).

Remark 3.3

Since $1 - F (u_{j, n_{j}})$ is the probability that $X_{j, i}$ exceeds level $u_{j, n_{j}}$ , equation (Equation29(29) $\begin{aligned} n_{1} (1 - F_{1} (u_{1, n_{1}})) + n_{2} (1 - F_{2} (u_{2, n_{2}})) \to τ \\ as n \to \infty . \end{aligned}$ (29) ) means that the expected number of exceedances of $u_{1, n_{1}}$ by ${X_{1, i}}_{i = 1}^{n_{1}}$ and $u_{2, n_{2}}$ by ${X_{2, i}}_{i = 1}^{n_{2}}$ in total converges to τ. When the sequence is generated from one distribution $F (x)$ , Theorem 3.11 can be reduced to the classical result by choosing $u_{1, n_{1}} = u_{2, n_{2}} = u_{n}$ . That is (31) $\begin{aligned} n (1 - F (u_{n})) \to τ, \end{aligned}$ (31) if and only if (32) $\begin{aligned} P (M_{n} \leq u_{n}) \to e^{- τ} . \end{aligned}$ (32) as $n \to \infty$ .

These new developments together with those in Cao and Zhang (Citation2020) shed the light of new researches in extreme values from heterogeneous populations. They provide the probability foundation to models introduced in the next section.

4. Transforming ARMA models to models for extreme value observations

The additive structures in traditional time series models, e.g., ARMA models, and their extensions, e.g., GARCH models, cannot describe the extremal clusters and tail dependence satisfactorily in many applications. To solve this issue, alternative models have been proposed in the extreme value literature. These models transform the additive structures in ARMA models to the competing structures in extreme observations (hidden and/or observable). Several such transformations are discussed in the following subsections.

4.1. Moving minimum corresponding process

Deheuvels (Citation1983) defined what he called the moving minimum (MM) corresponding process as $T_{i} = min {δ_{k} Z_{i - k}, - \infty < k < \infty}, - \infty < i < \infty,$ where $δ_{k} > 0$ , and ${Z_{k}}$ are i.i.d. standard exponential random variables. The main theorem of Deheuvels (Citation1983) is exactly stated as the following theorem.

Theorem 4.1

If $(T_{0}, \dots, T_{m})$ follows a joint multivariate extreme value distribution for minima with standard exponentially distributed marginal random variables, then there exist m + 1 sequences ${a_{k}^{i} (n), - \infty < k < \infty}$ depending on $n = 1, 2, \dots,$ of positive numbers, such that, if $T_{i} (n) = min {a_{k}^{i} (n) Z_{- k}, - \infty < k < \infty}$ , $i = 0, \dots, m$ , then $(T_{0} (n), \dots, T_{m} (n))$ converges in distribution to $(T_{0}, \dots, T_{m})$ as $n \to \infty$ .

The results of Deheuvels (Citation1983) are very strong, but the model itself is still not easily tractable for the estimation of parameters. Notice that the reciprocal of $T_{i}$ gives the moving maximum processes as $\begin{aligned} \frac{1}{T_{i}} & = max \{\frac{1}{δ_{k}} Z_{i - k}^{'}, - \infty < k < \infty\}, \\ - \infty < i < \infty \end{aligned}$ where ${Z_{k}^{'}}$ are i.i.d. unit Fréchet random variables.

4.2. Max-autoregressive moving average process

Davis and Resnick (Citation1989) studied what they called the max-autoregressive moving average (MARMA $(p, q)$ ) process of a stationary process ${X_{n}}$ which satisfies the MARMA recursion, $\begin{aligned} X_{n} & = (ϕ_{1} X_{n - 1}) \lor \dots \lor (ϕ_{p} X_{n - p}) \lor (Z_{n}) \\ \lor (θ_{1} Z_{n - 1}) \lor \dots \lor (θ_{q} Z_{n - q}) \end{aligned}$ for all n, where ∨ is a maximum operator, i.e., $a \lor b = max (a, b)$ , $ϕ_{i}, θ_{j} \geq 0, 1 \leq i \leq p, 1 \leq j \leq q$ and ${Z_{n}}$ is i.i.d. with common distribution function $F (x) = \exp {- σ x^{- 1}}$ . For any given ${ϕ_{i}}, {θ_{j}}$ , the corresponding process is a max-stable process. They have argued “it is unlikely that another subclass of the max-stable processes can be found which is as broad and tractable as the MARMA class”. Some basic properties of the MARMA processes have been shown and the prediction of a max-stable process has been studied relatively completely. However, much less is known about estimation of MARMA process. For prediction, see also Davis and Resnick (Citation1993). A naive estimation procedure for $ϕ_{i}, θ_{j}$ 's when the order q = 1 is given in Davis and Resnick (Citation1989).

4.3. Multivariate maxima of moving maxima process

Smith and Weissman (Citation1996) extended Deheuvels' MM process to a more general framework which is called multivariate maxima of moving maxima (henceforth M4) process. The definition is (33) $\begin{aligned} Y_{i d} = max_{l} max_{k} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (33) where ${Z_{l i}, l \geq 1, - \infty < i < \infty}$ are an array of independent unit Fréchet random variables. The constants ${a_{l, k, d}, l \geq 1, - \infty < k < \infty, 1 \leq d \leq D}$ are nonnegative constants satisfying (34) $\begin{aligned} \sum_{l = 1}^{\infty} \sum_{k = - \infty}^{\infty} a_{l, k, d} = 1 for d = 1, \dots, D . \end{aligned}$ (34) As we see that M4 processes deal with D dimensional random processes whereas MM processes deal with univariate processes (D = 1). Under the model (Equation33(33) $\begin{aligned} Y_{i d} = max_{l} max_{k} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (33) ), Smith and Weissman (Citation1996) have shown very attractive results. Some are parallel to the results of Deheuvels (Citation1983). Although MM processes are only specified over one index there are possibilities to easily extend to over two indexes. The extension of MM processes to M4 processes results in hopes to estimate model parameters easily. Following de Haan (Citation1984), (Equation33(33) $\begin{aligned} Y_{i d} = max_{l} max_{k} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (33) ) defines max-stable processes because for any finite number r and positive constants ${y_{i d}}$ we have (35) $\begin{aligned} P (Y_{i d} \leq y_{i d}, 1 \leq i \leq r, 1 \leq d \leq D) \\ = P (Z_{l, i - k} \leq \frac{y_{i d}}{a_{l, k, d}} for l \geq 1, - \infty < k < \infty, \\ 1 \leq i \leq r, 1 \leq d \leq D) \\ = P (Z_{l, m} \leq min_{1 - m \leq k \leq r - m} min_{1 \leq d \leq D} \frac{y_{m + k, d}}{a_{l, k, d}}, l \geq 1, \\ - \infty < m < \infty) \\ = \exp (- \sum_{l = 1}^{\infty} \sum_{m = - \infty}^{\infty} max_{1 - m \leq k \leq r - m} max_{1 \leq d \leq D} \frac{a_{l, k, d}}{y_{m + k, d}}) . \end{aligned}$ (35) This is (2.5) of Smith and Weissman (Citation1996) and we have $\begin{aligned} P^{n} (Y_{i d} \leq n y_{i d}, 1 \leq i \leq r, 1 \leq d \leq D) \\ = P (Y_{i d} \leq y_{i d}, 1 \leq i \leq r, 1 \leq d \leq D) \end{aligned}$ which tells that ${Y_{i}}$ are max-stable. They have argued that the extreme values of a multivariate stationary process may be characterised in terms of a limit max-stable process under quite general conditions. They also showed that a very large class of max-stable processes may be approximated by the M4 processes mainly because those processes have the same multivariate extremal index (Theorem 2.3 in Smith & Weissman, Citation1996). The theorem and conditions appear below.

Now fix $τ = (τ_{1}, \dots, τ_{D})^{'}$ with $0 \leq τ_{d} < \infty,$ $d = 1, \dots, D$ . Let ${u_{n d}, n \geq 1}$ be a sequence of thresholds such that $n {1 - F_{d} (u_{n d})} \to τ_{d}$ under the model assumption. Since $Z_{l k}$ is unit Fréchet we can take $u_{n d} = \frac{n}{τ_{d}}$ . Denote $u_{n} = (u_{n 1}, \dots, u_{n d})$ and $B_{j}^{k} (u_{n})$ the σ-field generated by the events ${X_{i d} \leq u_{n d}, j \leq i \leq k, 1 \leq d \leq D}$ for $1 \leq j \leq k \leq n$ . Define (36) $\begin{aligned} α_{n t} & = sup {| P (A \cap B) - P (A) P (B) | : A \in B_{1}^{k} (u_{n}), \\ B \in B_{k + t}^{n} (u_{n})} \end{aligned}$ (36) where the supremum is taken over $1 \leq k \leq n - t$ and two respective σ-fields. If there exists a sequence ${t_{n}, n \geq 1}$ such that (37) $\begin{aligned} t_{n} \to \infty, t_{n} / n \to 0, α_{n, t_{n}} \to 0 as n \to \infty, \end{aligned}$ (37) the mixing condition $△ (u_{n})$ is said to hold (Nandagopalan, Citation1994; Smith & Weissman, Citation1996). Further assuming there exists a sequence ${k_{n}, n \geq 1}$ such that (38) $\begin{aligned} k_{n} \to \infty, k_{n} t_{n} / n \to 0, k_{n} α_{n, t_{n}} \to 0 as n \to \infty . \end{aligned}$ (38) Let $r_{n} = [n / k_{n}]$ be the integer part of $n / k_{n}$ . We now exactly state a lemma and a theorem (Lemma 2.2 and their main theorem Theorem 2.3 of Smith & Weissman, Citation1996.)

Lemma 4.2

Suppose (Equation36(36) $\begin{aligned} α_{n t} & = sup {| P (A \cap B) - P (A) P (B) | : A \in B_{1}^{k} (u_{n}), \\ B \in B_{k + t}^{n} (u_{n})} \end{aligned}$ (36) )–(Equation38(38) $\begin{aligned} k_{n} \to \infty, k_{n} t_{n} / n \to 0, k_{n} α_{n, t_{n}} \to 0 as n \to \infty . \end{aligned}$ (38) ) hold. Then (39) $\begin{aligned} θ (τ) & = lim_{n \to \infty} P (Y_{i d} \leq u_{n d}, 2 \leq i \leq r_{n}, \\ 1 \leq d \leq D |max_{d} (\frac{Y_{1 d}}{u_{n d}}) > 1) . \end{aligned}$ (39) Alternatively, if we assume (40) $\begin{aligned} lim_{r \to \infty} lim_{n \to \infty} \sum_{i = r}^{r_{n}} \sum_{d = 1}^{D} P (Y_{i d} > u_{n d} |max_{d} (\frac{Y_{1 d}}{u_{n d}}) > 1) \\ = 0, \end{aligned}$ (40) then (Equation39(39) $\begin{aligned} θ (τ) & = lim_{n \to \infty} P (Y_{i d} \leq u_{n d}, 2 \leq i \leq r_{n}, \\ 1 \leq d \leq D |max_{d} (\frac{Y_{1 d}}{u_{n d}}) > 1) . \end{aligned}$ (39) ) is equivalent to (41) $\begin{aligned} θ (τ) & = lim_{r \to \infty} lim_{n \to \infty} P (Y_{i d} \leq u_{n d}, 2 \leq i \leq r, \\ 1 \leq d \leq D |max_{d} (\frac{Y_{1 d}}{u_{n d}}) > 1) . \end{aligned}$ (41)

This lemma is basically a restatement of results of O'Brien, for example O'Brien (Citation1987).

Theorem 4.3

Suppose $△ (u_{n})$ and (Equation40(40) $\begin{aligned} lim_{r \to \infty} lim_{n \to \infty} \sum_{i = r}^{r_{n}} \sum_{d = 1}^{D} P (Y_{i d} > u_{n d} |max_{d} (\frac{Y_{1 d}}{u_{n d}}) > 1) \\ = 0, \end{aligned}$ (40) ) hold for ${Y_{i}}$ , so that the multivariate extremal index $θ^{Y} (τ)$ is given by (Equation41(41) $\begin{aligned} θ (τ) & = lim_{r \to \infty} lim_{n \to \infty} P (Y_{i d} \leq u_{n d}, 2 \leq i \leq r, \\ 1 \leq d \leq D |max_{d} (\frac{Y_{1 d}}{u_{n d}}) > 1) . \end{aligned}$ (41) ). Suppose also the same assumptions hold for ${X_{i}}$ (with the same $t_{n}$ , $k_{n}$ sequences). So the multivariate extremal index $θ^{X} (τ)$ is also given by (Equation41(41) $\begin{aligned} θ (τ) & = lim_{r \to \infty} lim_{n \to \infty} P (Y_{i d} \leq u_{n d}, 2 \leq i \leq r, \\ 1 \leq d \leq D |max_{d} (\frac{Y_{1 d}}{u_{n d}}) > 1) . \end{aligned}$ (41) ) with $X_{i d}$ replacing $Y_{i d}$ everywhere. Then $θ^{Y} (τ) = θ^{X} (τ)$ .

The extremal index of the process defined by (Equation33(33) $\begin{aligned} Y_{i d} = max_{l} max_{k} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (33) ) is (42) $\begin{aligned} θ (τ) = \frac{\sum_{l} max_{k} max_{d} a_{l, k, d} τ_{d}}{\sum_{l} \sum_{k} max_{d} a_{l, k, d} τ_{d}} . \end{aligned}$ (42) However, $θ (τ)$ is not easy to obtain with observed data as one has to estimate all parameters $a_{l, k, d}$ , which is not straightforward.

We see that Sections 4.1–4.3 deal with probabilistic aspects of time series models for observed extreme value processes. Although theoretical results have been obtained, the estimation of parameters in both MARMA $(p, q)$ and M4 processes are not well developed and the applications of the two processes are very limited. In the next four subsections, we discuss statistical inference and applications.

4.4. Statistical inference of moving maximum models

Hall et al. (Citation2002) discussed moving maximum models $Y_{j} = sup {a_{j - i} Z_{i}, - \infty < i < \infty}$ where the distribution of $Z_{i}$ is assumed either $F (z | θ) = \exp (- z^{- θ})$ or the generalised Pareto distribution $F (z | θ) = 1 - (1 + z)^{- θ}$ . Then for a finite number of parameters, they chose $(θ, a_{(m)})$ to minimise (43) $\begin{aligned} D_{m} (θ, a_{(m)}) \\ = \int \{\hat{G} (y) - \prod_{i = 2 - m}^{k} F (min {a_{j - i}^{- 1} y_{j}, \\ {max (i, 1) \leq j \leq min (i + m, k)} | θ)\}}^{2} w (y) d y, \end{aligned}$ (43) where the integral is over $y = (y_{1}, \dots, y_{k}) \in R_{+}^{k}$ and (44) $\begin{aligned} \hat{G} (y) = (n - k)^{- 1} \sum_{i = 1}^{n - k} I_{(Y_{i + j - 1} \leq y_{j} for 1 \leq j \leq k)}, \end{aligned}$ (44) and w is a nonnegative weight function. We state their main theorem as follows.

Theorem 4.4

Under conditions:

F has support on the positive half-line, and is in the domain of attraction of a Type II extreme value distribution;
each $a_{i}$ is nonnegative and, for some $ϵ \in (0, r)$ , $0 < \sum_{i} a_{i}^{r - ϵ} < \infty$ .

Then (45) $\begin{aligned} sup_{- \infty < y_{1}, \dots, y_{k} < \infty} | P (Y_{1}^{*} \leq y_{1}, \dots, Y_{k}^{*} \leq y_{k} | Y_{1}, \dots, Y_{n}) \\ - P (Y_{1} \leq y_{1}, \dots, Y_{k} \leq y_{k}) | \to 0 \end{aligned}$ (45) where $Y_{j}^{*}$ is defined by $Y_{j}^{*} = sup {{\hat{a}}_{j - i} Z_{i}^{*}, - \infty < i < \infty},$ ${\hat{a}}_{j - i}$ and $\hat{θ}$ are solutions of (4.4), and $Z_{i}^{*}$ has distribution function $F (\cdot | \hat{θ})$ . Moreover, if $m \geq C_{4} (\log n)^{2}$ for $C_{4}$ sufficiently large, the rate of convergence in (Equation45(45) $\begin{aligned} sup_{- \infty < y_{1}, \dots, y_{k} < \infty} | P (Y_{1}^{*} \leq y_{1}, \dots, Y_{k}^{*} \leq y_{k} | Y_{1}, \dots, Y_{n}) \\ - P (Y_{1} \leq y_{1}, \dots, Y_{k} \leq y_{k}) | \to 0 \end{aligned}$ (45) ) is $O_{p} (n^{- (1 / 2) + δ})$ for all $δ > 0$ .

4.5. Finite representations of M4 processes

It can be seen that models having too many parameters to be estimated and/or having a complicated framework and hence lack of interpretability are hardly applicable to real data with a finite number of observations. This section reviews finite representations of M4 processes and their applications.

A finite dimensional M4 process can be written as follows: (46) $\begin{aligned} Y_{i d} = max_{1 \leq l \leq L} max_{- K_{1} \leq k \leq K_{2}} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (46) where $\sum_{l = 1}^{L} \sum_{k = - K_{1}}^{K_{2}} a_{l, k, d} = 1$ for $d = 1, \dots, D$ .

Under model (Equation46(46) $\begin{aligned} Y_{i d} = max_{1 \leq l \leq L} max_{- K_{1} \leq k \leq K_{2}} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (46) ), it is possible that a big value of $Z_{l k}$ dominates all other Z values within a certain period of length $K_{2} + K_{1} + 1$ and creates a moving pattern, i.e., $Y_{i d} = a_{l, i - k, d} Z_{l k}$ for i close to k. A moving pattern is known as a signature pattern. Zhang and Smith (Citation2004) gave a full investigation of probabilistic properties of model (Equation46(46) $\begin{aligned} Y_{i d} = max_{1 \leq l \leq L} max_{- K_{1} \leq k \leq K_{2}} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (46) ). Zhang and Smith (Citation2010) studied the estimation of the model, and considered the bivariate joint probabilities. A general joint probability formula of (Equation46(46) $\begin{aligned} Y_{i d} = max_{1 \leq l \leq L} max_{- K_{1} \leq k \leq K_{2}} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (46) ) is (47) $\begin{aligned} P (Y_{i d} \leq y_{i d}, 1 \leq i \leq r, 1 \leq d \leq D) \\ = \exp (- \sum_{l = 1}^{L} \sum_{m = 1 - K}^{r} max_{1 - m \leq k \leq r - m} max_{1 \leq d \leq D} \frac{a_{l, k, d}}{y_{m + k, d}}), \end{aligned}$ (47) where $a_{l, k, d} = 0$ when the triple subindex is outside the range defined in (Equation46(46) $\begin{aligned} Y_{i d} = max_{1 \leq l \leq L} max_{- K_{1} \leq k \leq K_{2}} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (46) ). Besides this general formula, it follows immediately that $P (Y_{i d} \leq y) = e^{- 1 / y}$ , which establishes that $Y_{i d}$ is itself a unit Fréchet random variable, and the following two special cases are used to construct estimators: (48) $\begin{aligned} P (Y_{i d} \leq 1, Y_{i + 1, d} \leq x) \\ = \exp (- \sum_{l = 1}^{L} \sum_{m = 1 - K}^{2} max (a_{l, 1 - m, d}, \frac{a_{l, 2 - m, d}}{x})) \\ =: e^{- b_{0 d} (x)}, \end{aligned}$ (48) and (49) $\begin{aligned} P (Y_{i d} \leq 1, Y_{i d^{'}} \leq x) \\ = \exp (- \sum_{l = 1}^{L} \sum_{m = 1 - K}^{1} max (a_{l, 1 - m, d}, \frac{a_{l, 1 - m, d^{'}}}{x})) \\ =: e^{- b_{d d^{'}} (x)} . \end{aligned}$ (49) It is clear that for each d, we can define new piecewise linear functions: $q_{0 d} (x) := x b_{0 d} (x)$ and $q_{d d^{'}} (x) := x b_{d d^{'}} (x)$ , where the notation A: = B means that A is denoted as B, and the points where these piecewise linear functions change slopes are at $a_{l, j, d} / a_{l, j^{'}, d}$ or $a_{l, k, d} / a_{l, k, d^{'}}$ . This suggests that if we can identify the functions $q_{0 d} (x)$ or $q_{d d^{'}} (x)$ , we may be able to identify all the parameters $a_{l, k, d}$ .

Relating (Equation48(48) $\begin{aligned} P (Y_{i d} \leq 1, Y_{i + 1, d} \leq x) \\ = \exp (- \sum_{l = 1}^{L} \sum_{m = 1 - K}^{2} max (a_{l, 1 - m, d}, \frac{a_{l, 2 - m, d}}{x})) \\ =: e^{- b_{0 d} (x)}, \end{aligned}$ (48) ) and (Equation49(49) $\begin{aligned} P (Y_{i d} \leq 1, Y_{i d^{'}} \leq x) \\ = \exp (- \sum_{l = 1}^{L} \sum_{m = 1 - K}^{1} max (a_{l, 1 - m, d}, \frac{a_{l, 1 - m, d^{'}}}{x})) \\ =: e^{- b_{d d^{'}} (x)} . \end{aligned}$ (49) ) to their empirical distribution counterparts, Zhang and Smith (Citation2010) solves a system of piecewise linear functions to construct parameter estimators. The consistency and asymptotic normality of the estimators are established. A financial application of value at risk (VaR) is conducted. A new extreme co-movement measure is defined as (50) $\begin{aligned} λ (t, T) = lim_{u ↗ x_{F}} P (ξ (t, T, u) \geq 2 | ξ (0, t, u) \geq 1) \end{aligned}$ (50) and (51) $\begin{aligned} ξ (t, T, u) = max_{t \leq i \leq T} \sum_{d = 1}^{D} I_{(Y_{i d} > u_{d})} . \end{aligned}$ (51) The idea in (Equation50(50) $\begin{aligned} λ (t, T) = lim_{u ↗ x_{F}} P (ξ (t, T, u) \geq 2 | ξ (0, t, u) \geq 1) \end{aligned}$ (50) ) is to estimate the maximum number of joint exceedances in the time period t to T given at least one exceedance in $(0, t)$ . The case t = T = 0 and D = 2 is the usual tail dependence function in the literature (Embrechts et al., Citation2003). Zhang and Smith (Citation2010) demonstrated that (Equation50(50) $\begin{aligned} λ (t, T) = lim_{u ↗ x_{F}} P (ξ (t, T, u) \geq 2 | ξ (0, t, u) \geq 1) \end{aligned}$ (50) ) is a meaningful market extreme co-movement measure. The tail dependence index λ, the coefficient of tail dependence η, the lag-k tail dependence index $λ_{k}$ , the extremal index $θ (τ)$ , and the extreme co-movement measure $λ (t, T)$ can be very useful in studying market crisis and contagions.

4.6. Sparse representations of M4 processes

To increase the estimation efficiency, a common strategy in statistical inference is to reduce the model complexity, i.e., to reduce the number of parameters. Examples include the variable selections in linear regression models, and the sparsity assumption in high-dimensional covariance matrix estimation. In time series, the number of parameters in an auto-regressive model is often less than the number of parameters in a moving average model when they both are fitted to a time series. To reduce the number of unknown parameters in (Equation46(46) $\begin{aligned} Y_{i d} = max_{1 \leq l \leq L} max_{- K_{1} \leq k \leq K_{2}} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (46) ), Zhang (Citation2008a) considered using geometric moving patterns to study extreme sea wave movements. The number of parameters in Zhang (Citation2008a) is much smaller than the number of parameters in the model studied by Zhang and Smith (Citation2010). This section discusses three scenarios that further simplify model (Equation33(33) $\begin{aligned} Y_{i d} = max_{l} max_{k} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (33) ) or (Equation46(46) $\begin{aligned} Y_{i d} = max_{1 \leq l \leq L} max_{- K_{1} \leq k \leq K_{2}} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (46) ) to more interpretable and workable models.

4.6.1. Markov chain MM process

In this section, we consider univariate time series model. Under model (Equation33(33) $\begin{aligned} Y_{i d} = max_{l} max_{k} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (33) ), we have the following lag-k tail dependence index formula (drop the index d): (52) $\begin{aligned} λ_{k} = \sum_{l = 1}^{\infty} \sum_{m = - \infty}^{\infty} min (a_{l, 1 - m}, a_{l, 1 + k - m}) . \end{aligned}$ (52) Obviously, as long as both $a_{l 0}$ and $a_{l K}$ are non-zero, $Y_{i}$ and $Y_{i + K}$ are dependent, and of course tail dependent as can be seen from (Equation52(52) $\begin{aligned} λ_{k} = \sum_{l = 1}^{\infty} \sum_{m = - \infty}^{\infty} min (a_{l, 1 - m}, a_{l, 1 + k - m}) . \end{aligned}$ (52) ). Zhang (Citation2005) considered the matrix of weights $(a_{l k})$ to have the following structure: (53) $\begin{aligned} (a_{l k}) = (\begin{matrix} a_{00} & 0 & 0 & 0 & \dots & 0 \\ a_{10} & a_{11} & 0 & 0 & \dots & 0 \\ a_{20} & 0 & a_{22} & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & 0 \\ a_{L 0} & 0 & 0 & 0 & \dots & a_{L L} \end{matrix}) . \end{aligned}$ (53) Now the number L corresponds to the maximal lag of tail dependencies within the sequence; the lag-k tail dependence index is characterised by the coefficients $a_{k 0}$ and $a_{k k}$ . The coefficient $a_{00}$ represents the proportion of the number of observations which are drawn from an independent process ${Z_{0 i}}$ . In other words, a very large value at time 0 has no future impact when the large value is generated from ${Z_{0 i}}$ . If both $a_{k 0}$ and $a_{k k}$ are not zero, then a very large value at time 0 has impact at time k when the large value is generated from ${Z_{k i}}$ . If there is strong lag-k tail dependence for each k, the value of $a_{00}$ will be small. Using the structure of (Equation53(53) $\begin{aligned} (a_{l k}) = (\begin{matrix} a_{00} & 0 & 0 & 0 & \dots & 0 \\ a_{10} & a_{11} & 0 & 0 & \dots & 0 \\ a_{20} & 0 & a_{22} & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & 0 \\ a_{L 0} & 0 & 0 & 0 & \dots & a_{L L} \end{matrix}) . \end{aligned}$ (53) ), Zhang (Citation2005) proposed three models for financial times. They are presented next.

Model 4.1

Combining MM (used to model scales) with a Markov process (used to model signs): two models for transformed negative $(-)$ returns and positive $(+)$ returns are $Y_{i}^{\pm} = max_{0 \leq l \leq L^{\pm}} max_{0 \leq k \leq K^{\pm}} a_{l k}^{\pm} Z_{l, i - k}^{\pm}, - \infty < i < \infty,$ where the superscript $^{-}$ means that the model is for negative returns only, and $^{+}$ means that the model is for positive returns only. In the following, we only discuss the model for negative returns, and the model for positive returns is obtained by simply replacing $^{-}$ by $^{+}$ . Constants ${a_{l k}^{-}}$ are nonnegative and satisfy $\sum_{l = 0}^{L^{-}} \sum_{k = 0}^{K^{-}} a_{l k}^{-} = 1$ . The matrix of weights is $(a_{l k}^{-}) = (\begin{matrix} a_{00}^{-} & 0 & 0 & 0 & \dots & 0 \\ a_{10}^{-} & a_{11}^{-} & 0 & 0 & \dots & 0 \\ a_{20}^{-} & 0 & a_{22}^{-} & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & 0 \\ a_{L^{-} 0}^{-} & 0 & 0 & 0 & \dots & a_{L^{-} L^{-}}^{-} \end{matrix}) .$ ${Z_{l i}^{-}, l = 1, \dots, L^{-},$ $- \infty < i < \infty}$ is an independent array, where random variables $Z_{l i}^{-}$ are identically unit Fréchet distributed. Let (54) $\begin{aligned} R_{i}^{-} = ξ_{i}^{-} Y_{i}^{-}, - \infty < i < \infty, \end{aligned}$ (54) where the process ${ξ_{i}^{-}}$ is independent of ${Y_{i}^{-}}$ and takes values in a finite set {0, 1} – i.e., ${ξ_{i}^{-}}$ is a sign process. Here ${Y_{i}^{-}}$ is an MM process, ${ξ_{i}^{-}}$ is a simple Markov process. ${R_{i}^{-}}$ is the negative return process. For simplicity, Model (Equation54(54) $\begin{aligned} R_{i}^{-} = ξ_{i}^{-} Y_{i}^{-}, - \infty < i < \infty, \end{aligned}$ (54) ) is regarded as MCMM processes.

Remark 4.1

If ${Y_{i}^{-}}$ is an independent process, then $P (R_{i + r}^{-} > u | R_{i}^{-} > u) \to 0$ as $u \to \infty$ for i>0, r>0, i.e., no tail dependence exists. This phenomenon tells that if there are tail dependencies in the observed process, the model with time dependence (through a Markov chain) only can not model the tail dependence if the random variables used to model scales are not tail dependent.

Remark 4.2

Empirical studies show that negative returns $Y_{i}^{-}$ and positive returns $Y_{i}^{+}$ are asymmetric, and conclude that models for positive returns should be different from models for negative returns. Notice that at any time i, one can only observe one of the $Y_{i}^{-}$ and $Y_{i}^{+}$ . The other one is missing. By introducing the Markov processes $ξ_{i}^{-}$ and $ξ_{i}^{+}$ , both $R_{i}^{-}$ and $R_{i}^{+}$ in (Equation54(54) $\begin{aligned} R_{i}^{-} = ξ_{i}^{-} Y_{i}^{-}, - \infty < i < \infty, \end{aligned}$ (54) ) are observable. We use $R_{i}^{-}$ and $R_{i}^{+}$ to construct parameter estimators.

Model 4.2

An MCMM process model for returns: with the established notations in (Equation54(54) $\begin{aligned} R_{i}^{-} = ξ_{i}^{-} Y_{i}^{-}, - \infty < i < \infty, \end{aligned}$ (54) ), let (55) $\begin{aligned} R_{i} & = sign (ξ_{i}) * [I_{ξ_{i} = - 1} Y_{i}^{-} + I_{ξ_{i} = 1} Y_{i}^{+}], \\ - \infty < i < \infty, \end{aligned}$ (55) where the process ${ξ_{i}}$ is a simple Markov process which is independent of ${Y_{i}^{\pm}}$ and takes values in a finite set ${- 1, 0, 1}$ . ${R_{i}}$ is the return process.

Remark 4.3

The processes ${ξ_{i}^{-}}$ , ${ξ_{i}^{+}}$ may be Bernoulli processes or Markov processes taking values in a finite set. The process ${ξ_{i}}$ may be considered as an independent process or a Markov process taking values in a finite set.

Remark 4.4

In Model (Equation55(55) $\begin{aligned} R_{i} & = sign (ξ_{i}) * [I_{ξ_{i} = - 1} Y_{i}^{-} + I_{ξ_{i} = 1} Y_{i}^{+}], \\ - \infty < i < \infty, \end{aligned}$ (55) ), as long as $Y_{i}^{-}$ , $Y_{i}^{+}$ , and $ξ_{i}$ are determined, $R_{i}$ is determined.

Remark 4.5

In many applications, only positive observed values are concerned. Insurance claims, annual maxima of precipitations, file sizes, durations in internet traffic at a certain point are some of those examples having positive values only. Even in our negative return model, the values have been converted into positive values.

4.6.2. Sparse random coefficient M4 processes

One feature in M4 processes is its signature patterns. To fit the data better, we may need a large number of patterns. One way to get rid of this feature is to set moving coefficients to be random. Tang et al. (Citation2013) considered a sparse M4 random coefficient model (SM4R), which has a parsimonious number of parameters, and it can potentially capture the major stylised facts exhibited by devolatised financial time series found in empirical studies. They demonstrated through real data analysis that the SM4R model can effectively be used to improve the estimates of the value at risk for portfolios consisting of multivariate financial returns while ignoring either temporal or cross-sectional tail dependence could potentially result in a serious underestimate of market risk.

The SM4R model is defined as (56) $\begin{aligned} X_{t d} & = max [B_{d}^{(t)} \cdot Z_{d}^{(t)}], \\ d = 1, \dots, D, - \infty < t < \infty, \end{aligned}$ (56) $\begin{aligned} B_{d}^{(t)} = (\begin{array}{ccccc} β_{00 d}^{(t)} & 0 & 0 & \dots & 0 \\ β_{10 d}^{(t)} & β_{11 d}^{(t)} & 0 & \dots & 0 \\ β_{20 d}^{(t)} & 0 & β_{22 d}^{(t)} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ β_{L_{d} 0 d}^{(t)} & 0 & 0 & \dots & β_{L_{d} L_{d} d}^{(t)} \end{array}), \\ Z_{d}^{(t)} = (\begin{array}{cccc} Z_{0 t d} & Z_{0, t - 1, d} & \dots & Z_{0, t - L_{d}, d} \\ Z_{1 t d} & Z_{1, t - 1, d} & \dots & Z_{1, t - L_{d}, d} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ Z_{L_{d} t d} & Z_{L_{d}, t - 1, d} & \dots & Z_{L_{d}, t - L_{d}, d} \end{array}), \end{aligned}$ where ${Z_{0 t} = (Z_{0 t 1}, \dots, Z_{0 t D})}$ is a sequence of i.i.d. D-dimensional random vectors (across t) having a multivariate extreme value distribution function with unit Fréchet margins, ${Z_{l t d}}$ are i.i.d. unit Fréchet random variables for $l \geq 1$ , $β_{00 d}^{(t)} = b_{0 d}$ , $β_{l 0 d}^{(t)} = a_{l t d} b_{l d}$ and $β_{l l d}^{(t)} = (1 - a_{l t d}) b_{l d}$ , $l = 1, \dots, L_{d}$ , $d = 1, \dots, D$ , ${a_{l t d}}$ are i.i.d. random variables on interval $[0, 1]$ , $b = {b_{l d}, l = 1, \dots, L_{d}, d = 1, \dots, D}$ are positive constants with $\sum_{l = 0}^{L_{d}} b_{l d} = 1$ for any d, and ${Z_{0 t}}$ , ${a_{l t d}}$ , and ${Z_{l t d} : l \geq 1}$ are independent with each other.

The cross-sectional tail dependence at time t is characterised by the copula function of $Z_{0 t}$ and tuned by ${b_{0 d}, d = 1, \dots, D}$ . With this setup, all kinds of parametric multivariate extreme value distributions can naturally be incorporated into the SM4R structure so that a parsimonious model with satisfactory level of generality can be achieved. This contrasts with the classical M4 setting where all components depend on the same set of shock variables $Z_{l t}$ , which inherently restricts the dependence structure to a given type and often requires a large number of parameters to achieve satisfactory performance.

For any $t = 1, \dots, r$ and positive constants $x = {x_{t d}, t = 1, \dots, r, d = 1, \dots, D}$ , the joint distribution function of ${X_{t d}, t = 1, \dots, r, d = 1, \dots, D}$ conditional on the generic random vector $a$ representing all the $a_{l k d}$ 's involved is (57) $\begin{aligned} P (X_{t d} \leq x_{t d}, 1 \leq t \leq r, 1 \leq d \leq D | a) \\ = \exp {- V (x, a; b)}, \end{aligned}$ (57) where $V (x, a; b)$ is defined as $\begin{aligned} V (x, a; b) & = \sum_{t = 1}^{r} V_{*} (\frac{x_{t 1}}{b_{01}}, \dots, \frac{x_{t D}}{b_{0 D}}) \\ + \sum_{d = 1}^{D} \sum_{l = 1}^{L} b_{l d} [\sum_{j = 1}^{min (l, r)} \frac{1 - a_{l j d}}{x_{j d}} \\ + \sum_{j^{'} = 1}^{max (r - l, 0)} max (\frac{a_{l j^{'} d}}{x_{j^{'} d}}, \frac{1 - a_{l, j^{'} + l, d}}{x_{j^{'} + l, d}}) \\ + \sum_{j^{''} = max (r - l, 0) + 1}^{r} \frac{a_{l j^{''} d}}{x_{j^{''} d}}], \end{aligned}$ $\exp (- V_{*} (\cdot)) = \exp (- V_{*} (\cdot | θ_{Z_{0}}))$ is the multivariate extreme value distribution of $Z_{0 t}$ , and $V_{*} (\cdot)$ is called the exponent measure of $Z_{0 t}$ (e.g., Resnick, Citation1987). A proof of (Equation57(57) $\begin{aligned} P (X_{t d} \leq x_{t d}, 1 \leq t \leq r, 1 \leq d \leq D | a) \\ = \exp {- V (x, a; b)}, \end{aligned}$ (57) ) can be found in Tang et al. (Citation2013). The marginal distribution of $X_{t d}$ is still unit Fréchet and the multivariate distribution function of $(X_{t 1}, \dots, X_{t D})$ is (58) $\begin{aligned} P (X_{t d} \leq x_{t d}, 1 \leq d \leq D) \\ = \exp \{- \sum_{d = 1}^{D} \frac{1}{x_{t d}} (1 - b_{0 d}) - V_{*} (\frac{x_{t 1}}{b_{01}}, \dots, \frac{x_{t D}}{b_{0 D}})\}, \end{aligned}$ (58) a new multivariate extreme value distribution function whose dependence is characterised by a mixture of independent and extreme value copulas.

One of the most popular multivariate extreme value distributions in practice is the logistic distribution (Gumbel-Hougaard copula with unit Fréchet margins) defined as: (59) $\begin{aligned} G_{\log} (x; α) = \exp \{- {(\sum_{d = 1}^{D} x_{d}^{- 1 / α})}^{α}\}, \end{aligned}$ (59) where $α \in (0, 1]$ . When $Z_{0 t} \sim G_{\log} (\cdot; α)$ , the joint distribution function (Equation58(58) $\begin{aligned} P (X_{t d} \leq x_{t d}, 1 \leq d \leq D) \\ = \exp \{- \sum_{d = 1}^{D} \frac{1}{x_{t d}} (1 - b_{0 d}) - V_{*} (\frac{x_{t 1}}{b_{01}}, \dots, \frac{x_{t D}}{b_{0 D}})\}, \end{aligned}$ (58) ) becomes (60) $\begin{aligned} P (X_{t d} \leq x_{t d}, 1 \leq d \leq D) \\ = \exp \{- \sum_{d = 1}^{D} \frac{1}{x_{t d}} (1 - b_{0 d}) - {[\sum_{d = 1}^{D} {(\frac{b_{0 d}}{x_{t d}})}^{\frac{1}{α}}]}^{α}\} . \end{aligned}$ (60) When D = 2, the cross-sectional bivariate distribution defined by (Equation60(60) $\begin{aligned} P (X_{t d} \leq x_{t d}, 1 \leq d \leq D) \\ = \exp \{- \sum_{d = 1}^{D} \frac{1}{x_{t d}} (1 - b_{0 d}) - {[\sum_{d = 1}^{D} {(\frac{b_{0 d}}{x_{t d}})}^{\frac{1}{α}}]}^{α}\} . \end{aligned}$ (60) ) is just the asymmetric logistic distribution proposed by Tawn (Citation1988). Interestingly, the copula function of (Equation60(60) $\begin{aligned} P (X_{t d} \leq x_{t d}, 1 \leq d \leq D) \\ = \exp \{- \sum_{d = 1}^{D} \frac{1}{x_{t d}} (1 - b_{0 d}) - {[\sum_{d = 1}^{D} {(\frac{b_{0 d}}{x_{t d}})}^{\frac{1}{α}}]}^{α}\} . \end{aligned}$ (60) ) is $\begin{aligned} C (u_{1}, \dots, u_{D}) & = C_{\log} (u_{1}^{b_{01}}, \dots, u_{D}^{b_{0 D}}) \\ \times C_{⊥} (u_{1}^{1 - b_{01}}, \dots, u_{D}^{1 - b_{0 D}}), \end{aligned}$ where $C_{\log}$ and $C_{⊥}$ are the Gumbel-Hougaard copula and independent copula, respectively. In general, if ${C_{j}, j = 1, \dots, P}$ are P D-dimensional copulas and ${b_{j d}}$ are any positive constants satisfying $\sum_{j = 1}^{P} b_{j d} = 1$ for $d = 1, \dots, D$ , then the function $C^{*}$ constructed as $C^{*} (u_{1}, \dots, u_{D}) = \prod_{j = 1}^{P} C_{j} (u_{1}^{b_{j 1}}, \dots, u_{D}^{b_{j D}})$ is still a copula function associated with a D-dimensional distribution function. To see this, consider the process ${Y_{t d}}$ defined as $Y_{t d} = max_{1 \leq j \leq P} b_{j d} Z_{t j d}$ for $d = 1, \dots, D$ , where ${(Z_{t j 1}, \dots, Z_{t j D}), j = 1, \dots, P, t = 1, \dots,}$ are i.i.d. D-variate random vectors with copula $C_{j}$ and unit Fréchet margins. It can be checked that $C^{*}$ is the copula of $(Y_{t 1}, \dots, Y_{t D})$ .

Besides the above discussed properties of model (Equation56(56) $\begin{aligned} X_{t d} & = max [B_{d}^{(t)} \cdot Z_{d}^{(t)}], \\ d = 1, \dots, D, - \infty < t < \infty, \end{aligned}$ (56) ), there are many other related properties and developments of the model can be found in Tang et al. (Citation2013). The estimators for the model parameters are constructed using GMM approach. We refer the details to Tang et al. (Citation2013).

4.6.3. Copula structured M4 processes

Statistical applications of classical parametric max-stable processes are still sparse mostly due to lack of (1) efficiency of statistical estimation of many parameters in the processes, (2) flexibility of concurrently modeling asymptotic independence, and asymptotic dependence among variables, and (3) capability of fitting real data directly. Zhang and Zhu (Citation2016) studied a more flexible model, i.e., a class of copula structured M4 (multivariate maxima and moving maxima) processes, and hence CSM4 for short. CSM4 processes are constructed by incorporating sparse random coefficients and structured extreme value copulas in asymptotically (in)dependent M4 (AIM4) processes. It is shown that the new model overcomes all of the aforementioned three constraints. They illustrated new features and advantages of the CSM4 model using simulated examples and real data of intra-daily maxima of high-frequency financial time series. They also studied the probabilistic properties of the proposed model and its statistical inference.

In Zhang and Zhu (Citation2016), they first proposed a new model that is good for marginally transformed observations. It is defined as: (61) $\begin{aligned} Y_{t d} & = max (W_{t d}^{1 / β_{d}}, max [A_{t d} \cdot Z_{t}]), \\ d = 1, \dots, D, - \infty < t < \infty, \end{aligned}$ (61) where $β_{d} > 0, d = 1, \dots, D$ ; ${W_{t}, \infty < t < \infty} = {(W_{t 1}, \dots, W_{t D}), \infty < t < \infty}$ is a sequence of i.i.d. D-dimensional random vectors following logistic distribution defined the same as (Equation59(59) $\begin{aligned} G_{\log} (x; α) = \exp \{- {(\sum_{d = 1}^{D} x_{d}^{- 1 / α})}^{α}\}, \end{aligned}$ (59) ) with $x = (x_{1}, \dots, x_{D})$ and $γ = 1 / α \geq 1$ . $A_{t d}$ is a sparse random loading matrix having the form: $\begin{aligned} A_{t d} & = (\begin{array}{ccc} α_{1 d} U_{1 t d} & α_{1 d} (1 - U_{1 t d}) & 0 \\ α_{2 d} U_{2 t d} & 0 & α_{2 d} (1 - U_{2 t d}) \\ ⋮ & ⋮ & ⋮ \\ α_{L d} U_{L t d} & 0 & 0 \end{array} \\ \begin{array}{cc} \dots & 0 \\ \dots & 0 \\ ⋱ & ⋮ \\ \dots & α_{L d} (1 - U_{L t d}) \end{array}) \end{aligned}$ with $α_{l d} \geq 0$ , $\sum_{l = 1}^{L} α_{l d} = 1$ for each d, and ${U_{l t d}, 1 = 1, \dots, L, - \infty < t < \infty, d = 1, \dots, D}$ being i.i.d. nondegenerated random variables on $[0, 1]$ . For $- \infty < t < \infty$ , $Z_{t} = {(Z_{l, t - j + 1}), 1 = 1, \dots, L; j = 1, \dots, L + 1}$ is an independent array, with $Z_{l, t - j}$ 's being unit Fréchet random variables; $A_{t d} \cdot Z_{t}$ represents the componentwise products between matrices $A_{t d}$ and $Z_{t}$ at time t, and ${max C}$ takes the maximum over all elements of matrix $C$ . ${Z_{l t}}$ , ${U_{l t d}}$ and ${W_{t}}$ are assumed to be independent of each other.

In the second step, assuming that ${X_{t} = (X_{t 1}, \dots, X_{t D}), t = 1, 2, \dots}$ is an observable multivariate stationary time series, they generalised (Equation61(61) $\begin{aligned} Y_{t d} & = max (W_{t d}^{1 / β_{d}}, max [A_{t d} \cdot Z_{t}]), \\ d = 1, \dots, D, - \infty < t < \infty, \end{aligned}$ (61) ) to a directly applicable model: (62) $\begin{aligned} X_{t d} = C_{d} Y_{t d}^{1 / ψ_{d}}, d = 1, \dots, D, \end{aligned}$ (62) where $C_{d} > 0$ is a scale parameter, $ψ_{d} > 0$ is a shape parameter for $d = 1, \dots, D$ .

Proposition 4.5

For a CSM4 process defined by (Equation61(61) $\begin{aligned} Y_{t d} & = max (W_{t d}^{1 / β_{d}}, max [A_{t d} \cdot Z_{t}]), \\ d = 1, \dots, D, - \infty < t < \infty, \end{aligned}$ (61) ), the serialFootnote¹ and cross-sectional asymptotic dependence index $λ_{d d_{r}^{'}}, r = 1, \dots, L,$ (here, $λ_{d d_{r}^{'}}$ stands for the tail dependence index between $X_{t d}$ and $X_{t + r, d^{'}}$ ) and the cross-sectional asymptotic dependence index $λ_{d d^{'}}$ are presented in Table ; When r>L, $λ_{d d_{r}^{'}} = 0$ .

Table 1. The asymptotic dependence index $λ_{d d_{r}^{'}}, 1 \leq r \leq L$ , and $λ_{d d^{'}}$ .

Display Table

The main differences between SM4R model (Equation56(56) $\begin{aligned} X_{t d} & = max [B_{d}^{(t)} \cdot Z_{d}^{(t)}], \\ d = 1, \dots, D, - \infty < t < \infty, \end{aligned}$ (56) ) and CSM4 model (Equation62(62) $\begin{aligned} X_{t d} = C_{d} Y_{t d}^{1 / ψ_{d}}, d = 1, \dots, D, \end{aligned}$ (62) ) are that model (Equation62(62) $\begin{aligned} X_{t d} = C_{d} Y_{t d}^{1 / ψ_{d}}, d = 1, \dots, D, \end{aligned}$ (62) ) can be directly applied to real data, and it can handle both asymptotic independence and asymptotic dependence as shown in Table . Like the inference of SM4R models, the parameter estimation is also based on the generalised method of moments approach; see Zhang and Zhu (Citation2016) for details.

4.7. Approximating a general process by a finite representation: theory

Approximating (Equation33(33) $\begin{aligned} Y_{i d} = max_{l} max_{k} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (33) ) by a finite representation in Sections 4.5 and 4.6 needs theoretical justifications. This section provides the necessary theoretical results. For completeness, proofs of the theoretical results are provided. More details can be found in Zhang (Citation2009).

4.7.1. Convergence in probability for the finitely discrete time domain processes

Lemma 4.6

Suppose $α_{k} (δ) \geq 0$ , and $\begin{aligned} \sum_{- \infty < k < \infty} α_{k} (δ) = 1, \sum_{| k | > K} α_{k} (δ) = δ, \\ X^{δ} = \underset{| k | > K}{⋁} α_{k} (δ) Z_{k}, Y^{δ} = \underset{| k | \leq K}{⋁} α_{k} (δ) Z_{k}, \end{aligned}$ where ${Z_{k}}$ are i.i.d. unit Fréchet random variables, K is a fixed number, and $δ > 0$ . Let $U^{δ} = \frac{1}{1 - δ} Y^{δ},$ then for any $ϵ > 0$ (63) $\begin{aligned} lim_{δ \to 0} P (| U^{δ} - X^{δ} \lor Y^{δ} | > ϵ) = 0. \end{aligned}$ (63)

Proof.

First, we have $\begin{aligned} P (X^{δ} \leq x) & = P (⋂_{| k | > K} α_{k} (δ) Z_{k} \leq x) \\ = \prod_{| k | > K} P (α_{k} (δ) Z_{k} \leq x) \\ = \prod_{| k | > K} e^{- \frac{α_{k} (δ)}{x}} = e^{- \frac{\sum_{| k | > K} α_{k} (δ)}{x}} = e^{- \frac{δ}{x}} . \end{aligned}$ It is easy to check that $Y^{δ}, X^{δ} \lor Y^{δ}, U^{δ}$ have the distributions: $Y^{δ} \sim e^{- \frac{1 - δ}{y}}, X^{δ} \lor Y^{δ} \sim e^{- \frac{1}{x}}, U^{δ} \sim e^{- \frac{1}{z}} .$ Since $\begin{aligned} P (X^{δ} > Y^{δ}) & = \int_{0}^{\infty} P (X^{δ} > y) \frac{1 - δ}{y^{2}} e^{- \frac{1 - δ}{y}} d y \\ = \int_{0}^{\infty} (1 - e^{- \frac{δ}{y}}) \frac{1 - δ}{y^{2}} e^{- \frac{1 - δ}{y}} d y \\ = 1 - (1 - δ) \int_{0}^{\infty} \frac{1}{y^{2}} e^{- \frac{1}{y}} d y \\ = δ, \\ P (U^{δ} - Y^{δ} > ϵ) & = P ((\frac{1}{1 - δ} - 1) Y^{δ} > ϵ) \\ = P (\frac{δ}{1 - δ} Y^{δ} > ϵ) \\ = P (Y^{δ} > \frac{(1 - δ) ϵ}{δ}) \\ = 1 - e^{- \frac{(1 - δ) δ}{(1 - δ) ϵ}} = 1 - e^{- \frac{δ}{ϵ}} . \end{aligned}$ then $\begin{aligned} P (| U^{δ} - X^{δ} \lor Y^{δ} | > ϵ) \\ = P (U^{δ} - X^{δ} \lor Y^{δ} > ϵ) + P (X^{δ} \lor Y^{δ} - U^{δ} > ϵ) \\ = P (U^{δ} - X^{δ} > ϵ, X^{δ} > Y^{δ}) \\ + P (U^{δ} - Y^{δ} > ϵ, Y^{δ} > X^{δ}) \\ + P (X^{δ} - U^{δ} > ϵ, X^{δ} > Y^{δ}) \\ + P (Y^{δ} - U^{δ} > ϵ, Y^{δ} > X^{δ}) \\ \leq 2 P (X^{δ} > Y^{δ}) + P (U^{δ} - Y^{δ} > ϵ) + 0 \\ = 2 δ + 1 - e^{- \frac{δ}{ϵ}} \end{aligned}$ which proves the assertion.

Remark

(Equation63(63) $\begin{aligned} lim_{δ \to 0} P (| U^{δ} - X^{δ} \lor Y^{δ} | > ϵ) = 0. \end{aligned}$ (63) ) means that for sufficiently small δ, random variables $U_{δ}, X^{δ} \lor Y^{δ}$ satisfy (64) $\begin{aligned} P (| U^{δ} - X^{δ} \lor Y^{δ} | > ϵ) < ϵ . \end{aligned}$ (64)

Lemma 4.7

Suppose $α_{k} (δ) \geq 0$ , and $\begin{aligned} \sum_{- \infty < k < \infty} α_{k} (δ) = 1, \sum_{| k | > K} α_{k} (δ) = δ, \end{aligned}$ $\begin{aligned} X_{i}^{δ} = \underset{| k | > K}{⋁} α_{k} (δ) Z_{i - k}, \\ Y_{i}^{δ} = \underset{| k | \leq K}{⋁} α_{k} (δ) Z_{i - k}, i = 1, \dots, n \end{aligned}$ where n is a finite number, ${Z_{k}}$ are i.i.d. unit Fréchet random variables, and K is a fixed number. Let $U_{i}^{δ} = \frac{1}{1 - δ} Y_{i}^{δ},$ then (65) $\begin{aligned} lim_{δ \to 0} P (⋃_{i = 1}^{n} {| U_{i}^{δ} - X_{i}^{δ} \lor Y_{i}^{δ} | > ϵ}) = 0. \end{aligned}$ (65)

Proof.

From Lemma 4.6, we have $P (| U_{i}^{δ} - X_{i}^{δ} \lor Y_{i}^{δ} | > ϵ) \leq 2 δ + 1 - e^{- \frac{δ}{ϵ}}$ for each i. Since $\begin{aligned} P (⋃_{i = 1}^{n} {| U_{i}^{δ} - X_{i}^{δ} \lor Y_{i}^{δ} | > ϵ}) \\ \leq \sum_{i = 1}^{n} P (| U_{i}^{δ} - X_{i}^{δ} \lor Y_{i}^{δ} | > ϵ) \\ \leq n (2 δ + 1 - e^{- \frac{δ}{ϵ}}) \end{aligned}$ which proves (Equation65(65) $\begin{aligned} lim_{δ \to 0} P (⋃_{i = 1}^{n} {| U_{i}^{δ} - X_{i}^{δ} \lor Y_{i}^{δ} | > ϵ}) = 0. \end{aligned}$ (65) ).

Remark

(Equation65(65) $\begin{aligned} lim_{δ \to 0} P (⋃_{i = 1}^{n} {| U_{i}^{δ} - X_{i}^{δ} \lor Y_{i}^{δ} | > ϵ}) = 0. \end{aligned}$ (65) ) implies for a fixed K, if $\sum_{| k | > K} α_{k} (δ) = δ$ is sufficiently small, the process $X_{i}^{δ} \lor Y_{i}^{δ} = \underset{- \infty < k < \infty}{⋁} α_{k} (δ) Z_{i - k}, i = 1, 2, \dots n$ can be closely approximated by the process $\begin{aligned} U_{i}^{δ} & = \frac{1}{1 - δ} Y_{i}^{δ} = \frac{1}{1 - δ} \underset{| k | \leq K}{⋁} α_{k} (δ) Z_{i - k}, \\ i = 1, 2, \dots, n \end{aligned}$ in the sense of (66) $\begin{aligned} P (⋃_{i = 1}^{n} | U_{i}^{δ} - X_{i}^{δ} \lor Y_{i}^{δ} | > ϵ) < ϵ . \end{aligned}$ (66)

Lemma 4.8

Suppose $α_{k} \geq 0$ , and $\begin{aligned} \sum_{- \infty < k < \infty} α_{k} = 1, \sum_{| k | > K (δ)} α_{k} = δ, \\ X^{δ} = \underset{| k | > K (δ)}{⋁} α_{k} Z_{k}, Y^{δ} = \underset{| k | \leq K (δ)}{⋁} α_{k} Z_{k}, \end{aligned}$ where ${Z_{k}}$ are i.i.d. unit Fréchet random variables. Let $U^{δ} = \frac{1}{1 - δ} Y^{δ},$ then for any $ϵ > 0$ , there exist $δ_{0} (ϵ)$ and finite number $K_{0} (δ_{0} (ϵ))$ such that (67) $\begin{aligned} P (| U^{δ_{0} (ϵ)} - X^{δ_{0} (ϵ)} \lor Y^{δ_{0} (ϵ)} | > ϵ) < ϵ . \end{aligned}$ (67)

Proof.

Follow the lines in the Proof of Lemma 4.6, we have $P (| U^{δ} - X^{δ} \lor Y^{δ} | > ϵ) \leq 2 δ + 1 - e^{- \frac{δ}{ϵ}} .$ It is easy to check that $g (x) = 2 x + 1 - e^{- x / ϵ}$ is a strictly monotone increasing function and $g (0) = 0$ , so there exists $δ_{0}^{*} (ϵ)$ such that $2 δ_{0}^{*} (ϵ) + 1 - e^{- \frac{δ_{0}^{*} (ϵ)}{ϵ}} = ϵ .$ So $P (| U^{δ_{0}^{*} (ϵ)} - X^{δ_{0}^{*} (ϵ)} \lor Y^{δ_{0}^{*} (ϵ)} | > ϵ) \leq ϵ .$ Since $\sum_{- \infty < k < \infty} α_{k} = 1$ , there exists a finite number $K_{0} (δ_{0} (ϵ))$ such that $\sum_{| k | > K (δ_{0} (ϵ))} α_{k} = δ_{0} (ϵ)$ where $δ_{0} (ϵ) \leq δ_{0}^{*} (ϵ),$ so $\begin{aligned} P (| U^{δ_{0} (ϵ)} - X^{δ_{0} (ϵ)} \lor Y^{δ_{0} (ϵ)} | > ϵ) \\ < 2 δ_{0} (ϵ) + 1 - e^{- \frac{δ_{0} (ϵ)}{ϵ}} < ϵ \end{aligned}$ and the proof is then completed.

The following lemma is immediate.

Lemma 4.9

Suppose $α_{k} \geq 0$ , and $\begin{aligned} \sum_{- \infty < k < \infty} α_{k} = 1, \sum_{| k | > K (δ)} α_{k} = δ, \\ X_{i}^{δ} = \underset{| k | > K (δ)}{⋁} α_{k} Z_{k}, \\ Y_{i}^{δ} = \underset{| k | \leq K (δ)}{⋁} α_{k} Z_{k}, i = 1, \dots, n \end{aligned}$ where n is a finite number and ${Z_{k}}$ are i.i.d. unit Fréchet. Let $U_{i}^{δ} = \frac{1}{1 - δ} Y_{i}^{δ},$ then for any $ϵ > 0$ , there exist $δ_{0} (ϵ)$ and a finite number $K_{0} (δ_{0} (ϵ))$ such that (68) $\begin{aligned} P (⋃_{i = 1}^{n} | U_{i}^{δ_{0} (ϵ)} - X_{i}^{δ_{0} (ϵ)} \lor Y_{i}^{δ_{0} (ϵ)} | > ϵ) < ϵ . \end{aligned}$ (68)

Remark

if $\sum_{| k | > K (δ)} α_{k} = δ$ is sufficiently small, the process $X_{i}^{δ} \lor Y_{i}^{δ} = \underset{- \infty < k < \infty}{⋁} α_{k} Z_{i - k}, i = 1, 2, \dots n$ can be closely approximated by the process $\begin{aligned} U_{i}^{δ} & = \frac{1}{1 - δ} Y_{i}^{δ} = \frac{1}{1 - δ} \underset{| k | \leq K (δ)}{⋁} α_{k} Z_{i - k}, \\ i = 1, 2, \dots, n \end{aligned}$ in the sense of $P (⋃_{i = 1}^{n} | U_{i}^{δ (ϵ)} - X_{i}^{δ (ϵ)} \lor Y_{i}^{δ (ϵ)} | > ϵ) < ϵ .$

4.7.2. Some results on almost sure convergence and infinitely discrete time domain

In Section 4.7.1, we considered convergence in probability as $δ \to 0$ . In this section we will consider a sequence of ${δ_{t}}$ which has the property that ${δ_{t}} \to 0$ as $t \to \infty$ , and convergence for infinitely discrete time domain and finitely discrete time domain.

Lemma 4.10

For any given $ϵ > 0$ and $γ > 0$ , let $δ_{t}$ be small and satisfy $γ 2^{- t} \geq 2 δ_{t} + 1 - e^{- \frac{δ_{t}}{ϵ / 2}}, δ_{t} > 0, t = 1, 2, \dots .$ For a fixed K, let $\begin{aligned} \sum_{- \infty < k < \infty} α_{k} (t) = 1, \sum_{| k | > K} α_{k} (t) = δ_{t}, α_{k} (t) \geq 0, \\ X^{t} = \underset{| k | > K}{⋁} α_{k} (t) Z_{k}, Y^{t} = \underset{| k | \leq K}{⋁} α_{k} (t) Z_{k}, \\ U^{δ_{t}} = \frac{1}{1 - δ_{t}} Y^{t} . \end{aligned}$ Then (69) $\begin{aligned} U^{δ_{t}} - X^{t} \lor Y^{t} \overset{a . s .}{⟶} 0. \end{aligned}$ (69)

Proof.

First we have $\begin{aligned} P (lim_{t \to \infty} | U^{δ_{t}} - X^{t} \lor Y^{t} | > ϵ) \\ \leq P ({| U^{δ_{t}} - X^{t} \lor Y^{t} | > ϵ / 2}, i . o .) \\ = P (⋂_{t = 1}^{\infty} ⋃_{j = t}^{\infty} {| U^{δ_{j}} - X^{j} \lor Y^{j} | > ϵ / 2}) \\ = lim_{t \to \infty} P (⋃_{j = t}^{\infty} {| U^{δ_{j}} - X^{j} \lor Y^{j} | > ϵ / 2}) \\ \leq lim_{t \to \infty} \sum_{j = t}^{\infty} P ({| U^{δ_{j}} - X^{j} \lor Y^{j} | > ϵ / 2}) \\ \leq lim_{t \to \infty} γ 2^{- t + 1} = 0. \end{aligned}$ Since ϵ is arbitrary, we have $P (lim_{t \to \infty} (U^{δ_{t}} - X^{t} \lor Y^{t}) \neq 0) = 0$ which shows (Equation69(69) $\begin{aligned} U^{δ_{t}} - X^{t} \lor Y^{t} \overset{a . s .}{⟶} 0. \end{aligned}$ (69) ).

Lemma 4.11

Suppose $δ_{t}$ , $α_{k} (t)$ are defined the same as in Lemma 4.10, let $\begin{aligned} X_{i}^{t} = \underset{| k | > K}{⋁} α_{k} (t) Z_{i - k}, Y_{i}^{t} = \underset{| k | \leq K}{⋁} α_{k} (t) Z_{i - k}, \\ U_{i}^{δ_{t}} = \frac{1}{1 - δ_{t}} Y_{i}^{t}, i = 1, 2, \dots \end{aligned}$ then for each i, $U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} \overset{a . s .}{⟶} 0$ and (70) $\begin{aligned} U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} \overset{a . s .}{⟶} 0, all i, as t \to \infty \end{aligned}$ (70)

Proof.

By Lemma 4.10, it is obvious for each i, $U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} \overset{a . s .}{⟶} 0$ . Since the index set on i is a countable set (Equation70(70) $\begin{aligned} U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} \overset{a . s .}{⟶} 0, all i, as t \to \infty \end{aligned}$ (70) ) is immediate.

Lemma 4.12

For finitely discrete time domain $i = 1, 2, \dots, n$ and the conditions in Lemma 4.11, we have (71) $\begin{aligned} P (lim_{t \to \infty} sup_{i \leq n} | U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} | > ϵ) = 0. \end{aligned}$ (71)

Proof.

Since for any finite n $\begin{aligned} P (lim_{t \to \infty} sup_{i \leq n} | U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} | > ϵ) \\ \leq P (sup_{i \leq n} | U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} | > 3 ϵ / 4, i . o .) \\ \leq P (⋂_{t = 1}^{\infty} ⋃_{j = t}^{\infty} \{sup_{i \leq n} | U_{i}^{δ_{j}} - X_{i}^{j} \lor Y_{i}^{j} | > 3 ϵ / 4\}) \\ \leq P (⋂_{t = 1}^{\infty} ⋃_{j = t}^{\infty} ⋃_{i = 1}^{n} {| U_{i}^{δ_{j}} - X_{i}^{j} \lor Y_{i}^{j} | > ϵ / 2}) \\ = lim_{t \to \infty} P (⋃_{j = t}^{\infty} ⋃_{i = 1}^{n} {| U_{i}^{δ_{j}} - X_{i}^{j} \lor Y_{i}^{j} | > ϵ / 2}) \\ = lim_{t \to \infty} P (⋃_{i = 1}^{n} ⋃_{j = t}^{\infty} {| U_{i}^{δ_{j}} - X_{i}^{j} \lor Y_{i}^{j} | > ϵ / 2}) \\ \leq lim_{t \to \infty} \sum_{i = 1}^{n} P (⋃_{j = t}^{\infty} {| U_{i}^{δ_{j}} - X_{i}^{j} \lor Y_{i}^{j} | > ϵ / 2}) \\ = \sum_{i = 1}^{n} lim_{t \to \infty} P (⋃_{j = t}^{\infty} {| U_{i}^{δ_{j}} - X_{i}^{j} \lor Y_{i}^{j} | > ϵ / 2}) \\ \leq n lim_{t \to \infty} γ 2^{- t + 1} = 0. \end{aligned}$

Lemma 4.13

Suppose $δ_{t}$ , $α_{k} (t)$ are defined the same as in Lemma 4.10, let $\begin{aligned} X_{i}^{t} = \underset{| k | > K}{⋁} α_{k} (t) Z_{i - k}, Y_{i}^{t} = \underset{| k | \leq K}{⋁} α_{k} (t) Z_{i - k}, \\ U_{i}^{δ_{t}} = \frac{1}{1 - δ_{t}} Y_{i}^{t}, i = 1, 2, \dots, n \end{aligned}$ then (72) $\begin{aligned} lim_{t \to \infty} P (⋃_{i = 1}^{n} {| U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} | > ϵ}) = 0. \end{aligned}$ (72)

Proof.

From Lemma 4.6 we have $P (| U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} | > ϵ) \leq 2 δ_{t} + 1 - e^{- \frac{δ_{t}}{ϵ}}$ for each i. Since $\begin{aligned} P (⋃_{i = 1}^{n} {| U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} | > ϵ}) \\ \leq \sum_{i = 1}^{n} P (| U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} | > ϵ) \\ \leq n (2 δ_{t} + 1 - e^{- \frac{δ_{t}}{ϵ}}) \end{aligned}$ which proves (Equation72(72) $\begin{aligned} lim_{t \to \infty} P (⋃_{i = 1}^{n} {| U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} | > ϵ}) = 0. \end{aligned}$ (72) ).

Since (Equation70(70) $\begin{aligned} U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} \overset{a . s .}{⟶} 0, all i, as t \to \infty \end{aligned}$ (70) ) implies (73) $\begin{aligned} P (⋃_{i = 1}^{\infty} ⋂_{m = 1}^{\infty} ⋃_{t = m}^{\infty} {| U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} | > ϵ}) = 0, \end{aligned}$ (73) we now generalise (Equation73(73) $\begin{aligned} P (⋃_{i = 1}^{\infty} ⋂_{m = 1}^{\infty} ⋃_{t = m}^{\infty} {| U_{i}^{δ_{t}} - X_{i}^{t} \lor Y_{i}^{t} | > ϵ}) = 0, \end{aligned}$ (73) ) to a more general case and state a theorem which shows how a finite moving range model arbitrarily closely approximates an infinite range moving process. The proof is just a generalisation of the arguments above.

Theorem 4.14

Suppose $a_{l, k, d} \geq 0$ , and $\sum_{l = 1}^{\infty} \sum_{k = - \infty}^{\infty} a_{l, k, d} (δ_{d}) = 1, \sum_{{l k} ⊈ K} a_{l, k, d} (δ_{d}) = δ_{d} > 0,$ where $K$ is a finite index set, (74) $\begin{aligned} Y_{i d}^{δ_{d}} & = max_{l} max_{k} a_{l, k, d} (δ_{d}) Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (74) (75) $\begin{aligned} {\tilde{Y}}_{i δ_{d}} & = max_{{l k} ⫅ K} b_{l, k, d} (δ_{d}) Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (75) where $\sum_{{l k} ⫅ K} b_{l, k, d} (δ_{d}) = 1$ for each $d = 1, \dots, D$ . And $b_{l, k, d} (δ_{d}) = \frac{1}{1 - δ_{d}} a_{l, k, d} (δ_{d})$ for ${l k} ⫅ K$ , then there exist ${δ_{m d}}$ , $δ_{m d} \to 0$ as $m \to \infty$ , such that $P (⋃_{d = 1}^{D} ⋃_{i = - \infty}^{\infty} ⋂_{t = 1}^{\infty} ⋃_{m = t}^{\infty} {| {\tilde{Y}}_{i δ_{m d}} - Y_{i d} | > ϵ}) = 0.$ Therefore, we conclude ${{\tilde{Y}}_{i δ_{d}}} \to {Y_{i d}}$ for all i and d with probability one.

4.8. Autoregressive models with additive errors and competing errors

Using the result of logarithm transformation of Fréchet random variables to Gumbel random variables, Naveau et al. (Citation2011) proposed the following time series model: (76) $\begin{aligned} X_{t, α} = μ + max (γ + α \log (S_{t, α}) + α X_{t - 1, α}, ξ_{t}), \end{aligned}$ (76) where μ is a location parameter, and both $X_{t, α}$ and $ξ_{t}$ are Gumbel distributed, $S_{t, α}$ is positive α stable distributed. We can regard (Equation76(76) $\begin{aligned} X_{t, α} = μ + max (γ + α \log (S_{t, α}) + α X_{t - 1, α}, ξ_{t}), \end{aligned}$ (76) ) as a time series model with log of positive α stable noises $\log (S_{t, α})$ and hidden max Gumbel shocks $ξ_{t}$ . The idea is as follows. Suppose that $ξ_{t} = - \infty$ for all t. Then model (Equation76(76) $\begin{aligned} X_{t, α} = μ + max (γ + α \log (S_{t, α}) + α X_{t - 1, α}, ξ_{t}), \end{aligned}$ (76) ) is a pure autoregressive signal process. Alternatively, suppose that $P (ξ_{t} > - \infty) = 1$ in (Equation76(76) $\begin{aligned} X_{t, α} = μ + max (γ + α \log (S_{t, α}) + α X_{t - 1, α}, ξ_{t}), \end{aligned}$ (76) ) at time t. If the signal value of $ξ_{t}$ is stronger than the signal resulted from the autoregressive signal process, then $ξ_{t}$ is the new observed signal value, i.e., the signal process is altered by a hidden (max) Gumbel shock. This model may also be regarded as an autoregressive model with an infinite number of change points. We note that $\log (S_{t, α}) + X_{t - 1, α}$ is a Gumbel type random variable according to Fougéres et al. (Citation2009). As a result, $X_{t, α}$ is Gumbel distributed. A Gumbel distributed random variable can be used to model asymmetric heavy tailed observations, e.g., the deseasonalised weekly maxima of river flow rates in Naveau et al. (Citation2011).

Considering the simplest autoregressive structure and the apparent interpretability of (Equation76(76) $\begin{aligned} X_{t, α} = μ + max (γ + α \log (S_{t, α}) + α X_{t - 1, α}, ξ_{t}), \end{aligned}$ (76) ), model (Equation76(76) $\begin{aligned} X_{t, α} = μ + max (γ + α \log (S_{t, α}) + α X_{t - 1, α}, ξ_{t}), \end{aligned}$ (76) ) can serve as an alternative model to models (Equation33(33) $\begin{aligned} Y_{i d} = max_{l} max_{k} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (33) ) and (Equation46(46) $\begin{aligned} Y_{i d} = max_{1 \leq l \leq L} max_{- K_{1} \leq k \leq K_{2}} a_{l, k, d} Z_{l, i - k}, d = 1, \dots, D, \end{aligned}$ (46) ).

5. Systematic risks, extreme co-movements and risk contagions

Risk analysis and management permeate in our daily life in almost all aspects. Building a good risk model and a good risk measure reduces the probability of a failure of a system. There have been many developments in this subject in many application areas. For example, value at risk (VaR) is a popular risk measure in the banking industry and the insurance industry. Chen et al. (Citation2019) compared several popular risk measures and proposed a new mark to market value at risk (MMVaR) measures to deal with settlements being taken daily during the holding period. For details of VaR and other risk measures, we refer to Chen et al. (Citation2019) and references therein.

Systematic risk (or systemic risk) is a contemporary research topic. Systematic risk can occur in almost every area (system), e.g., flooding, forest fire, earthquake, market crash, financial crisis, economic crisis, global disease pandemic (like CoVID-19), among many others. There are many challenges in modeling systematic risks caused by some rare events. Models discussed in Section 4 for modeling extreme values and rare events can certainly be suitable for many applications. In this section, we discuss a recently proposed framework for studying systematic risk in an integrated time series model. We present some computational results for studying extreme co-movements and risk contagions in Dow Jones stock market. The methodology can be applied to many other scenarios.

5.1. Autoregressive tail-index models

There are many ways to characterise and describe systematic risk and risk contagions. Various models have been developed for modeling systematic risks. Some recent developments include (Kelly, Citation2014; Mao & Zhang, Citation2018; Massacci, Citation2016; Zhang & Schwaab, Citation2017; Zhao et al., Citation2018) amongst others. We review one of these models in this section and point out its connections to models discussed in Section 4.

Let's consider a system that contains hundreds or thousands of subsystems. As long as one subsystem fails, the whole system fails. As such, the systematic risk will be the dominating risk from one sub-system among all components at any given time. Examples of such system with systematic risk include those mentioned earlier in Section 5.

The above arguments can be described as: Suppose a financial system/portfolio contains p stocks. The stock return time series are ${X_{i t}}_{i = 1}^{p}, t = 1, \dots, T$ . We consider two types of such multivariate time series (high dimensional). The first type is that ${X_{i t}}_{i = 1}^{p}$ are a set of panel time series, and we are interested in modeling the cross-sectional maxima $Q_{t} = max_{1 \leq i \leq p} X_{i t}$ . Such problems arise in many applications, including modeling the maximum daily loss across a group of stocks in a portfolio. The second type is that ${X_{i t}}_{i = 1}^{p}$ denote the p intra-period observations for a univariate time series within period t, and we are interested in modeling the intra-period maxima $Q_{t} = max_{1 \leq i \leq p} X_{i t}$ . For example, one may be interested in the intra-day maxima of high-frequency trading losses that occur on the same day.

With the established theory in Sections 2 and 3, $Q_{t}$ may be modeled by a GEV distribution or a product of extreme value distributions. For the rest of the paper, we consider the following model: (77) $\begin{aligned} Q_{t} & = μ + σ_{t} Y_{t}^{1 / α_{t}}, \end{aligned}$ (77) (78) $\begin{aligned} \log σ_{t} & = β_{0} + β_{1} \log σ_{t - 1} + β_{2} \exp (- β_{3} Q_{t - 1}), \end{aligned}$ (78) (79) $\begin{aligned} \log α_{t} & = γ_{0} + γ_{1} \log α_{t - 1} + γ_{2} \exp (- γ_{3} Q_{t - 1}), \end{aligned}$ (79) where ${Y_{t}}$ is a sequence of i.i.d. unit Fréchet random variables, $0 \leq β_{1} \neq γ_{1} < 1$ , $β_{2} < 0$ , $β_{3} > 0$ , $γ_{2} > 0$ , and $γ_{3} > 0$ .

We note that ${Y_{t}}$ 's are assumed i.i.d. in this section. They can be tail (in)dependent and modeled using the models discussed in Section 4, and hence the modeling accuracy may be increased. We leave this task for future researches.

We now present an analysis of Dow Jones' 30 (DJI30) stock negative returns. Due to two stocks were just added less than two years. The actual number of stocks is 28. The data is downloaded from Yahoo Finance within the time window 1 January 2000 to 21 March 2020. We first fit a GARCH(1,1) model with t distributed innovations to each individual return series. Using the negative return series divided by the fitted volatilities, we get standardised negative return series for each stock. Taking the maximum value of the 28 standardised negative returns each day, we obtain a time series, i.e., $Q_{t}$ . We fit $Q_{t}$ to model (Equation77(77) $\begin{aligned} Q_{t} & = μ + σ_{t} Y_{t}^{1 / α_{t}}, \end{aligned}$ (77) )–(Equation79(79) $\begin{aligned} \log α_{t} & = γ_{0} + γ_{1} \log α_{t - 1} + γ_{2} \exp (- γ_{3} Q_{t - 1}), \end{aligned}$ (79) ). The fitted parameter values and standard deviations are presented in Table .

Table 2. MLE for cross-sectional maxima of negative standardised daily log-returns for DJI30 from 1 January 2000 to 21 March 2020.

Display Table

From Table , we can see that except $γ_{0}$ , all other coefficients are significant, which is an indication that (Equation77(77) $\begin{aligned} Q_{t} & = μ + σ_{t} Y_{t}^{1 / α_{t}}, \end{aligned}$ (77) )–(Equation79(79) $\begin{aligned} \log α_{t} & = γ_{0} + γ_{1} \log α_{t - 1} + γ_{2} \exp (- γ_{3} Q_{t - 1}), \end{aligned}$ (79) ) is suitable for the cross-sectional maxima data. Figure plots the recovered tail indexes ${{\hat{α}}_{t}}$ (left) and scale parameters ${{\hat{σ}}_{t}}$ (right).

Figure 2. Estimated tail indexes ${{\hat{α}}_{t}}$ (left) and scale parameters ${{\hat{σ}}_{t}}$ (right) from 1 January 2000 to 21 March 2020 for Dow Jones 30.

Figure 2. Estimated tail indexes {αˆt} (left) and scale parameters {σˆt} (right) from 1 January 2000 to 21 March 2020 for Dow Jones 30.

From Figure , we can see that ${{\hat{α}}_{t}}$ and ${{\hat{σ}}_{t}}$ vary all the time, i.e., they cannot be constant. ${{\hat{α}}_{t}}$ and ${{\hat{σ}}_{t}}$ are affected by the observed extreme values from previous days. Together with Table , one can see that (Equation77(77) $\begin{aligned} Q_{t} & = μ + σ_{t} Y_{t}^{1 / α_{t}}, \end{aligned}$ (77) )–(Equation79(79) $\begin{aligned} \log α_{t} & = γ_{0} + γ_{1} \log α_{t - 1} + γ_{2} \exp (- γ_{3} Q_{t - 1}), \end{aligned}$ (79) ) are good for describing the extreme movements in Dow Jones market. Additional analysis results and inference can be done using the fitted model as discussed in Zhao et al. (Citation2018).

We will use the recovered value $Y_{t}$ to study extreme co-movements in the next section.

5.2. Extreme co-movements and risk contagions

Extreme co-movements refer to extreme values co-occur during a short time period. Risk contagions stand for that risk variables impact each other at extreme values. We use TQCC to study stock price extreme co-movements and risk contagions among Dow Jones' 30 stocks. In the literature, among many applications, Wu et al. (Citation2012) illustrated the idea of studying the equity market index extreme co-movement using TQCC, and Deng and Zhang (Citation2020) used TQCC to study haze extreme contagions in a vast region in China. In their applications, a generalised extreme value (GEV) fitting was implemented. In this section, we adopt a rank transformation using simulated data advocated in Zhang et al. (Citation2011). The computation procedure is shown next.

Consider two stocks A and B among 28 stocks in Dow Jones 30. Denote their standardised negative return series derived from GARCH(1,1) fitting as $ϵ_{t}^{A}$ and $ϵ_{t}^{B}$ , respectively; denote the sorted (from smallest to largest) series of the recovered $Y_{t}$ series from the fitted model (Equation77(77) $\begin{aligned} Q_{t} & = μ + σ_{t} Y_{t}^{1 / α_{t}}, \end{aligned}$ (77) ) as $Y_{t}^{r}$ ; denote $Y_{t}^{A} = Y_{rank (ϵ_{t}^{A})}^{r}$ .

For k = 1:1000,
- Simulate a sequence of unit Fréchet random variables $Z_{t}$ s;
- Sort ${Z_{t}}$ , and then denote the sorted sequence as ${Y_{t}^{s}}$ ;
- Set $Y_{t}^{B} = Y_{rank (ϵ_{t}^{B})}^{s}$ ;
- Set $(X_{t}, Y_{t}) = (Y_{t}^{A}, Y_{t}^{B})$ ;
- $u_{n} = min ($ the 97.5th percentile of ${Y_{t}^{A}}$ , the 97.5th percentile of ${Y_{t}^{B}})$ ;
- Compute $q_{.975, k}$ using TQCC formula (Equation21(21) $\begin{aligned} q_{u_{n}} = \frac{\begin{array}{l} max_{1 \leq i \leq n} {\frac{max (X_{i}, u_{n})}{max (Y_{i}, u_{n})} - 1} \\ + max_{1 \leq i \leq n} {\frac{max (Y_{i}, u_{n})}{max (X_{i}, u_{n})} - 1} \end{array}}{\begin{array}{l} max_{1 \leq i \leq n} {\frac{max (X_{i}, u_{n})}{max (Y_{i}, u_{n})}} \\ \times max_{1 \leq i \leq n} {\frac{max (Y_{i}, u_{n})}{max (X_{i}, u_{n})}} - 1 \end{array}} \end{aligned}$ (21) );
- Set $q_{A B} = max_{1 \leq k \leq 1000} (q_{.975, k}$ ).
Repeat the above process for all combinations of all 28 stocks.

For comparison, we also compute linear correlation coefficients $r_{A B}$ between two standardised time series $ϵ_{t}^{A}$ and $ϵ_{t}^{B}$ . We use $q_{A B}$ and $r_{A B}$ to generate dendrograms in Figures and .

Figure 3. Dendrograms based on TQCC (left) and linear correlation coefficients (right) using the complete linkage.

Figure 4. Dendrograms based on TQCC (left) and linear correlation coefficients (right) using the single linkage.

From Figures and , we can immediately see that the stock clusters based on TQCC and the stock clusters based on correlation coefficients are different. It is clear correlation coefficients measure the relationship in the middle parts of the data. However, TQCC can reveal the relation in the tails. In Figure , the left panel based on TQCC can reveal the highest probability that given one stock price plunges within the left sub-branch of clustered compounds, one stock price also plunges within the right sub-branch of clustered compounds. In Figure , the left panel based on TQCC can reveal the smallest probability that given one stock price plunges within the left sub-branch of clustered compounds, one stock price also plunges within the right sub-branch of clustered compounds. Such information can help investors make better trading decisions and form better portfolios during volatile market movements.

6. Conclusions

In this review paper, a series of models and tail dependence measures have been discussed. These models can be applied to many research studies as long as extreme values and rare events are concerned. They can be used as alternative models and/or enhanced models to ARMA and GARCH models. They can be further extended to much more advanced models to meet the need for more complex data. In the literature, there are many other models that can be excellent candidate models for studying extremes, e.g., Brown-Resnick processes (Brown & Resnick, Citation1977; Huser & Davison, Citation2013). The new extreme value theory discussed in Section 3 can open a broad area of research. The autoregressive models with additive errors and competing errors and the autoregressive tail-index models can be extended to high order and high-dimensions. As to statistical inference, Bayesian inference of these models is also a promising research direction. In the literature of extreme value and moving maxima models, Kunihama et al. (Citation2012) applied particle filter method to study moving maxima models. Idowu and Zhang (Citation2017) applied a hybrid MCMC approach in a class of SM4R models. It can be expected that more researches using the discussed models will be rooted in many research areas.

Acknowledgments

The author thank Editor Jun Shao and two referees for their valuable comments. The work was partially supported by NSF-DMS-1505367 and NSF-DMS-2012298.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was partially supported by NSF - DMS-1505367 and NSF - DMS-2012298.

Notes on contributors

Zhengjun Zhang

Zhengjun Zhang is Professor of Statistics at the University of Wisconsin. His main research areas of expertise are in financial time series and rare event modeling, virtual standard currency, risk management, nonlinear dependence, asymmetric dependence, asymmetric and directed causal inference, gene-gene relationship in rare diseases.

Notes

1 If we let

d = d^{'}

, then the first three cases in the second column of Table correspond to the serial asymptotic dependence index

λ_{d d_{r}}

References

Beirlant, J., Goegebeur, Y., Segers, J., & Teugels, J. (2006). Statistics of extremes: Theory and applications. John Wiley & Sons.
Google Scholar
Brown, B. M., & Resnick, S. I. (1977). Extreme values of independent stochastic processes. Journal of Applied Probability, 14(4), 732–739. https://doi.org/10.2307/3213346
Web of Science ®Google Scholar
Bücher, A., & Segers, J. (2017). On the maximum likelihood estimator for the generalized extreme-value distribution. Extremes, 20(4), 839–872. https://doi.org/10.1007/s10687-017-0292-6
Web of Science ®Google Scholar
Cao, W., & Zhang, Z. (2020). New extreme value theory for maxima of maxima. Statistical Theory and Related Fields. https://doi.org/10.1080/24754269.2020.1846115
Google Scholar
Cartwright, D. E. (1958). On estimating the mean energy of sea waves from the highest waves in a record. Proceedings of the Royal Society of London, Series A, 247(1248), 22–28.https://doi.org/10.1098/rspa.1958.0168
Google Scholar
Castillo, E., Hadi, A. S., Balakrishnan, N., & Sarabia, J. M. (2005). Extreme value and related models with applications in engineering and science. Wiley Series in Probability and Statistics.
Google Scholar
Chen, Y., Wang, Z. C., & Zhang, Z. (2019). Mark to market value at risk. Journal of Econometrics, 208(1), 299–321. https://doi.org/10.1016/j.jeconom.2018.09.017
Web of Science ®Google Scholar
Coles, S., Bawa, J., Trenner, L., & Dorazio, P. (2001). An introduction to statistical modeling of extreme values (vol. 208). Springer.
Google Scholar
Coles, S., Heffernan, J., & Tawn, J. (1999). Dependence measures for extreme value analyses. Extremes, 2(4), 339–365. https://doi.org/10.1023/A:1009963131610
Google Scholar
Coles, S. G., & Tawn, J. A. (1991). Modeling extreme multivariate events. Journal of the Royal Statistical Society, Series B, 53(2), 377–392.https://doi.org/10.1111/j.2517-6161.1991.tb01830.x
Google Scholar
Davis, R. A., & Resnick, S. I. (1989). Basic properties and prediction of max-ARMA processes. Advances in Applied Probability, 21(4), 781–803. https://doi.org/10.2307/1427767
Web of Science ®Google Scholar
Davis, R. A., & Resnick, S. I. (1993). Prediction of stationary max-stable processes. The Annals of Applied Probability, 3(2), 497–525. https://doi.org/10.1214/aoap/1177005435
Google Scholar
de Haan, L. (1984). A spectral representation for max-stable processes. The Annals of Probability, 12(4), 1194–1204. https://doi.org/10.1214/aop/1176993148
Web of Science ®Google Scholar
de Haan, L. (1985). Extremes in higher dimensions: The model and some statistics. In Proceedings of the 45th session international statistic institute. International Statistical Institute.
Google Scholar
de Haan, L. (1993). Extreme value statistics. In Janos Galambos, James Lechner, & Emil Simiu (Eds.), Extreme value theory and applications (pp. 93–122). Kluwer Academic.
Google Scholar
de Haan, L., & Ferreira, A. (2007). Extreme value theory: An introduction. Springer.
Google Scholar
de Haan, L., & Resnick, S. I. (1977). Limit theory for multivariate sample extremes. Zeitschrift für Wahrscheinlichkeitstheorie und Vrwandte Gebiete, 40(4), 317–337. https://doi.org/10.1007/BF00533086
Google Scholar
Deheuvels, P. (1983). Point processes and multivariate extreme values. Journal of Multivariate Analysis, 13(2), 257–272. https://doi.org/10.1016/0047-259X(83)90025-8
Web of Science ®Google Scholar
Deng, L., & Zhang, Z. (2018). Assessing the features of extreme smog in China and the differentiated treatment strategy. Proceedings of the Royal Society A, 474, 220920170511. https://doi.org/10.1098/rspa.2017.0511
Google Scholar
Deng, L., & Zhang, Z. (2020). The haze extreme co-movements in Beijing-Tianjin-Hebei region and its extreme dependence pattern recognitions. Science Progress, 103(2), 36850420916315. https://doi.org/10.1177/0036850420916315
Web of Science ®Google Scholar
Dey, D. K., & Yan, J. (2016). Extreme value modeling and risk analysis: Methods and applications, (EDS). Chapman & Hall/CRC.
Google Scholar
Drees, H., Ferreira, A., & de Haan, L. (2004). On maximum likelihood estimation of the extreme value index. Annals of Applied Probability, 14(3), 1179–1201. https://doi.org/10.1214/105051604000000279
Web of Science ®Google Scholar
Embrechts, P., Klüppelberg, C., & Mikosch, T. (1997). Modelling extremal events for insurance and finance. Springer.
Google Scholar
Embrechts, P., Lindskog, F., & McNeil, A. (2003). Modelling dependence with copulas and applications to risk management. In S. Rachev (Ed.), Handbook of heavy tailed distributions in finance (pp. 329–384). Elsevier.
Google Scholar
Embrechts, P., McNeil, A., & Straumann, D. (2002). Correlation and dependence in risk management: Properties and pitfalls. In M. A. H. Dempster (Ed.), Risk management: Value at risk and beyond (pp. 176–223). Cambridge University Press.
Google Scholar
Embrechts, P., Resnick, S. I., & Samorodnitsky, G. (1999). Extreme value theory as a risk management tool. North American Actuarial Journal, 3(2), 30–41. https://doi.org/10.1080/10920277.1999.10595797
Google Scholar
Ferreira, H., & Ferreira, M. (2018). Estimating the extremal index through local dependence. Annales de l'Institut Henri Poincaré, Probabilités et Statistiques, 54(2), 587–605. https://doi.org/10.1214/16-AIHP815
Web of Science ®Google Scholar
Finkenstädt, B., & Rootzén, H. E. (2004). Extreme values in finance, telecommunications, and the environment. Chapman & Hall/CRC.
Google Scholar
Fougéres, A.-L., Nolan, J. P., & Rootzén, H. (2009). Models for dependent extremes using stable mixtures. Scandinavian Journal of Statistics, 36, 42–59. https://doi.org/10.1111/j.1467-9469.2008.00613.x
Web of Science ®Google Scholar
Galambos, J. (1987). Asymptotic theory of extreme order statistics (2nd ed.). Krieger.
Google Scholar
Hall, P., Peng, L., & Yao, Q. (2002). Moving-maximum models for extrema of time series. Journal of Statistical Planning and Inference, 103(1–2), 51–63. https://doi.org/10.1016/S0378-3758(01)00197-5
Web of Science ®Google Scholar
Heffernan, J. E. (2001). A directory of coefficients of tail dependence. Extremes, 3(3), 279–290. https://doi.org/10.1023/A:1011459127975
Google Scholar
Heffernan, J. E., Tawn, J. A., & Zhang, Z. (2007). Asymptotically (in)dependent multivariate maxima of moving maxima processes. Extremes, 10(1-2), 57–82. https://doi.org/10.1007/s10687-007-0035-1
Google Scholar
Hsing, T. (1993). Extremal index estimation for a weakly dependent stationary sequence. The Annals of Statistics, 21(4), 2043–2071. https://doi.org/10.1214/aos/1176349409
Web of Science ®Google Scholar
Huser, R., & Davison, A. C. (2013). Composite likelihood estimation for the Brown–Resnick process. Biometrika, 100(2), 511–518. https://doi.org/10.1093/biomet/ass089
Web of Science ®Google Scholar
Hüsler, J., & Li, D. (2009). Testing asymptotic independence in bivariate extremes. Journal of Statistical Planning and Inference, 139(3), 990–998. https://doi.org/10.1016/j.jspi.2008.06.003
Web of Science ®Google Scholar
Idowu, T., & Zhang, Z. (2017). An extended sparse max-linear moving model with application to high-frequency financial data. Statistical Theory and Related Fields, 1(1), 92–111. https://doi.org/10.1080/24754269.2017.1346852
Google Scholar
Joe, H. (2014). Dependence modeling with copulas. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Chapman & Hall/CRC.
Google Scholar
Kelly, B. (2014). The dynamic power law model. Extremes, 17(4), 557–583. https://doi.org/10.1007/s10687-014-0193-x
Web of Science ®Google Scholar
Kunihama, T., Omori, Y., & Zhang, Z. (2012). Efficient estimation and particle filter for max-stable processes. Journal of Time Series Analysis, 33(1), 61–80. https://doi.org/10.1111/jtsa.2011.33.issue-1
Web of Science ®Google Scholar
Leadbetter, M. R. (1983). Extremes and local dependence in stationary sequences. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 65(2), 291–306. https://doi.org/10.1007/BF00532484
Web of Science ®Google Scholar
Leadbetter, M. R., Lindgren, G., & Rootzén, H. (1983). Extremes and related properties of random sequences and processes. Springer Science & Business Media.
Google Scholar
Leadbetter, M. R., Weissman, I., de Haan, L., & Rootzén, H. (1989). On clustering of high values in statistically stationary seriess. In J. Sanson (Ed.), Proceedings of the 4th international meeting on statistical climatology. New Zealand Meteorological Service.
Google Scholar
Ledford, A. W., & Tawn, J. A. (1996). Statistics for near independence in multivariate extreme values. Biometrika, 83(1), 169–187. https://doi.org/10.1093/biomet/83.1.169
Web of Science ®Google Scholar
Ledford, A. W., & Tawn, J. A. (1997). Modeling dependence within joint tail regions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(2), 475–499. https://doi.org/10.1111/rssb.1997.59.issue-2
Google Scholar
Loynes, R. M. (1965). Extreme values in uniformly mixing stationary stochastic processes. The Annals of Mathematical Statistics, 36(3), 993–999. https://doi.org/10.1214/aoms/1177700071
Google Scholar
Mao, G., & Zhang, Z. (2018). Stochastic tail index model for high frequency financial data with Bayesian analysis. Journal of Econometrics, 205(2), 470–487. https://doi.org/10.1016/j.jeconom.2018.03.019
Web of Science ®Google Scholar
Martins, A. P., & Ferreira, H. (2014). Extremal properties of M4 processes. Test, 23(2), 388–408. https://doi.org/10.1007/s11749-014-0358-6
Web of Science ®Google Scholar
Massacci, D. (2016). Tail risk dynamics in stock returns: Links to the macroeconomy and global markets connectedness. Management Science, 63(9), 1–18.https://doi.org/10.1287/mnsc.2016.2488
Web of Science ®Google Scholar
McNeil, A. J., & Frey, R. (2000). Estimation of tail-related risk measures for heteroscedastic financial time series: An extreme value approach. Journal of Empirical Finance, 7(3–4), 271–300. https://doi.org/10.1016/S0927-5398(00)00012-8
Google Scholar
Meinguet, T. (2012). Maxima of moving maxima of continuous functions. Extremes, 15(3), 267–297. https://doi.org/10.1007/s10687-011-0136-8
Web of Science ®Google Scholar
Mikosch, T., Embrechts, P., & Klüppelberg, C. (1997). Modelling extremal events for insurance and finance. Springer Verlag.
Google Scholar
Nandagopalan, S. (1990). Multivariate extremes and the estimation of the extremal index [PhD thesis]. University of North Carolina, Chapel Hill, Dept. Of Statistics.
Google Scholar
Nandagopalan, S. (1994). On the multivariate extremal index. Journal of Research of the National Institute of Standards and Technology, 99(4), 543–550. https://doi.org/10.6028/jres.099.052
Web of Science ®Google Scholar
Naveau, P., Zhang, Z., & Zhu, B. (2011). An extension of max autoregressive models. Statistics and Its Interface, 4(2), 253–266. https://doi.org/10.4310/SII.2011.v4.n2.a19
Web of Science ®Google Scholar
Newell, G. F. (1964). Asymptotic extremes for m-dependent random variables. The Annals of Mathematical Statistics, 35(3), 1322–1325. https://doi.org/10.1214/aoms/1177703288
Google Scholar
O'Brien, G. L. (1974). Limit theorems for the maximum term of a stationary process. Annals of Probability, 2(3), 540–545. https://doi.org/10.1214/aop/1176996673
Web of Science ®Google Scholar
O'Brien, G. L. (1987). Extreme values for stationary and Markov sequences. Annals of Probability, 15(1), 281–291. https://doi.org/10.1214/aop/1176992270
Web of Science ®Google Scholar
Peng, L. (1999). Estimation of the coefficient of tail dependence in bivariate extremes. Statistics & Probability Letters, 43(4), 399–409. https://doi.org/10.1016/S0167-7152(98)00280-6
Web of Science ®Google Scholar
Pereira, L., & Fonseca, C. (2019). Statistical methods for assessing the contagion of spatial extreme events among regions. Communications in Statistics - Theory and Methods, 48(13), 3208–3218. https://doi.org/10.1080/03610926.2018.1473612
Web of Science ®Google Scholar
Pickands, J. (1981). Multivariate extreme value distributions. In Proceedings 43rd session of the international statistical institute (Vol. 49, pp. 859–878).
Google Scholar
Reich, B. J., & Shaby, B. A. (2019). A spatial Markov model for climate extremes. Journal of Computational and Graphical Statistics, 28(1), 117–126. https://doi.org/10.1080/10618600.2018.1482764
Web of Science ®Google Scholar
Resnick, S. I. (1987). Extreme values, regular variation, and point processes. Springer.
Google Scholar
Rodgers, J. L., & Nicewander, W. A. (1988). Thirteen ways to look at the correlation coefficient. American Statistician, 42(1), 59–66. https://doi.org/10.2307/2685263
Web of Science ®Google Scholar
Salmon, F. (2012). The formula that killed wall street. Significance, 9(1), 16–20. https://doi.org/10.1111/j.1740-9713.2012.00538.x
Google Scholar
Salvadori, G., De Michele, C., Kottegoda, N. T., & Rosso, R. (2007). Extremes in nature: An approach using copulas. Springer. Complexity.
Google Scholar
Sibuya, M. (1959). Bivariate extreme statistics, I. Annals of the Institute of Statistical Mathematics, 11(2), 195–210.
Google Scholar
Smith, R. L. (1985). Maximum likelihood estimation in a class of nonregular cases. Biometrika, 72(1), 67–90. https://doi.org/10.1093/biomet/72.1.67
Web of Science ®Google Scholar
Smith, R. L. (1990). Extreme value theory. Handbook of Applicable Mathematics, 7, 437–471.
Google Scholar
Smith, R. L., & Weissman, I. (1994). Estimating the extremal index. Journal of the Royal Statistical Society, Series B, 56(3), 515–528.
Google Scholar
Smith, R. L., & Weissman, I. (1996). Characterization and estimation of the multivariate extremal index. Technical report, University of North Carolina-Chapel Hill.
Google Scholar
Tang, R., Shao, J., & Zhang, Z. (2013). Sparse moving maxima models for tail dependence in multivariate financial time series. Journal of Statistical Planning and Inference, 143(5), 882–895. https://doi.org/10.1016/j.jspi.2012.11.008
Web of Science ®Google Scholar
Tawn, J. A. (1988). Bivariate extreme value theory: Models and estimation. Biometrika, 75(3), 397–415.
Web of Science ®Google Scholar
Weissman, I. (1994). On the extremal index of stationary sequences. Technical report, The Conference on Multivariate Extreme Value Estimation with Applications to Economics and Finance, Erasmus University, Rotterdam.
Google Scholar
Wu, J., Zhang, Z., & Zhao, Y. (2012). Study of the tail dependence structure in global financial markets using extreme value theory. Journal of Reviews on Global Economics, 1(1), 62–81.
Google Scholar
Yang, X., Frees, E. W., & Zhang, Z. (2011). A generalized beta copula with applications in modeling multivariate long-tailed data. Insurance: Mathematics and Economics, 49(2), 265–284. https://doi.org/10.1016/j.insmatheco.2011.04.007
Web of Science ®Google Scholar
Zhang, Z. (2005). A new class of tail-dependent time series models and its applications in financial time series. Advances in Econometrics, 20(B), 323–358.
Google Scholar
Zhang, Z. (2008a). The estimation of M4 processes with geometric moving patterns. Annals of the Institute of Statistical Mathematics, 60(1), 121–150. https://doi.org/10.1007/s10463-006-0078-0
Web of Science ®Google Scholar
Zhang, Z. (2008b). Quotient correlation: A sample based alternative to Pearson's correlation. The Annals of Statistics, 36(2), 1007–1030. https://doi.org/10.1214/009053607000000866
Web of Science ®Google Scholar
Zhang, Z. (2009). On approximating max-stable processes and constructing extremal copula functions. Statistical Inference for Stochastic Processes, 12(1), 89–114. https://doi.org/10.1007/s11203-008-9027-2
Google Scholar
Zhang, Z., Qi, Y., & Ma, X. (2011). Asymptotic independence of correlation coefficients with application to testing hypothesis of independence. Electronic Journal of Statistics, 5, 342–372. https://doi.org/10.1214/11-EJS610
Web of Science ®Google Scholar
Zhang, X., & Schwaab, B. (2017). Tail risk in government bond markets and ECB asset purchases. Working paper.
Google Scholar
Zhang, Z., & Smith, R. L. (2004). The behavior of multivariate maxima of moving maxima processes. Journal of Applied Probability, 41(4), 1113–1123. https://doi.org/10.1239/jap/1101840556
Web of Science ®Google Scholar
Zhang, Z., & Smith, R. L. (2010). On the estimation and application of max-stable processes. Journal of Statistical Planning and Inference, 140(5), 1135–1153. https://doi.org/10.1016/j.jspi.2009.10.014
Web of Science ®Google Scholar
Zhang, Z., Zhang, C., & Cui, Q. (2017). Random threshold driven tail dependence measures with application to precipitation data analysis. Statistica Sinica, 27(2), 685–709.https://doi.org/10.5705/ss.202015.0421
Web of Science ®Google Scholar
Zhang, Z., & Zhu, B. (2016). Copula structured M4 processes with application to high-frequency financial data. Journal of Econometrics, 194(2), 231–241. https://doi.org/10.1016/j.jeconom.2016.05.004
Web of Science ®Google Scholar
Zhao, Z., & Zhang, Z. (2018). Semi-parametric dynamic max-copula model for multivariate time series. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(2), 409–432. https://doi.org/10.1111/rssb.12256
Web of Science ®Google Scholar
Zhao, Z., Zhang, Z., & Chen, R. (2018). Modeling maxima with autoregressive conditional Fréchet model. Journal of Econometrics, 207, 2325–351. https://doi.org/10.1016/j.jeconom.2018.07.004
Web of Science ®Google Scholar
Zhou, C. (2008). A two-step estimator of the extreme value index. Extremes, 11(3), 281–302. https://doi.org/10.1007/s10687-008-0058-2
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

On studying extreme values and systematic risks with nonlinear time series models and tail dependence measures

ABSTRACT

1. Introduction

2. Classical extreme value theory: brief review

2.1. Univariate extreme value theory

2.1.1. Independent sequence

2.1.2. Stationary sequence

2.2. Multivariate extreme value theory

2.2.1. Independent sequence

2.2.2. Stationary sequence

2.2.3. The copula representations of multivariate extreme value distributions

3. Recent advances on tail (in)dependence and new extreme value theory

3.1. Tail equivalence and tail (in)dependence

3.2. New extreme value theory for heterogeneous populations

4. Transforming ARMA models to models for extreme value observations

4.1. Moving minimum corresponding process

4.2. Max-autoregressive moving average process

4.3. Multivariate maxima of moving maxima process

4.4. Statistical inference of moving maximum models

4.5. Finite representations of M4 processes

4.6. Sparse representations of M4 processes

4.6.1. Markov chain MM process

4.6.2. Sparse random coefficient M4 processes

4.6.3. Copula structured M4 processes

Table 1. The asymptotic dependence index λddr′, 1≤r≤L, and λdd′.

4.7. Approximating a general process by a finite representation: theory

4.7.1. Convergence in probability for the finitely discrete time domain processes

4.7.2. Some results on almost sure convergence and infinitely discrete time domain

4.8. Autoregressive models with additive errors and competing errors

5. Systematic risks, extreme co-movements and risk contagions

5.1. Autoregressive tail-index models

Table 2. MLE for cross-sectional maxima of negative standardised daily log-returns for DJI30 from 1 January 2000 to 21 March 2020.

5.2. Extreme co-movements and risk contagions

6. Conclusions

Acknowledgments

Disclosure statement

Additional information

Funding

Notes on contributors

Zhengjun Zhang

Notes

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 1. The asymptotic dependence index $λ_{d d_{r}^{'}}, 1 \leq r \leq L$ , and $λ_{d d^{'}}$ .