Full article: On classes of consistent tests for the Type I Pareto distribution based on a characterization involving order statistics

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We propose new classes of goodness-of-fit tests for the Pareto Type I distribution. These tests are based on a characterization of the Pareto distribution involving order statistics. We derive the limiting null distribution of the tests and also show that the tests are consistent against fixed alternatives. The finite-sample performance of the newly proposed tests are evaluated and compared to some of the existing tests, where it is found that the new tests are competitive in terms of powers. The paper concludes with an application to a real world data set, namely the earnings of the 22 highest paid participants in the inaugural season of LIV golf.

Keywords:

Mathematics Subject Classifications:

1. Introduction

Many real-world phenomena exhibit measurements with heavy-tailed behaviour and, as such, lend themselves to be modelled using the Pareto distribution. First developed by economist and socialist, Vilfredo Pareto [Citation1], the heavy-tailed Pareto distribution was initially used to model the unequal distribution of wealth in a population, but has found application in a number of other scenarios. Examples of situations modelled using this distribution include studies involving insurance claim premiums [Citation2–4], studies involving medical insurance claims [Citation5], and studies investigating gestation duration [Citation6,Citation7], to name a few (see [Citation8]).

Due to its popularity, this distribution has enjoyed the attention of numerous researchers resulting in a number of different versions of the Pareto distribution, including the Types I, II, III, IV, and Generalised Pareto distributions. However, in this paper we will focus only on the use of the Type I Pareto distribution, which has cumulative distribution function (CDF) and probability density function (PDF) respectively given by $F_{β, σ} (x) = {\begin{cases} 1 - {(\frac{x}{σ})}^{- β}, & x \geq σ \\ 0, & x < σ \end{cases} and f_{β, σ} (x) = {\begin{cases} β σ^{β} x^{- β - 1}, & x \geq σ \\ 0, & x < σ \end{cases}$ where $σ > 0$ and $β > 0$ denote respectively the scale and the shape parameters. The Type I version of the Pareto distribution with $σ = 1$ has a number of practical applications as shown in a variety of old and new research works. In the earlier works, for example, Fisk [Citation9] and Steindl [Citation10] cited several examples of economic data which follow the Type I Pareto distribution, whereas Berger and Mandelbrot [Citation11] proposed using the Type I Pareto distribution in studies of error clusters in communication circuits. It has also been shown to be useful in applications where service times and queuing systems are modelled, as discussed in Harris [Citation12]. More recent applications include using the Type I Pareto distribution to model the wealth distribution in the Forbes 400 list [Citation13], and modelling the city size distribution in the United States [Citation14].

Hence, under these considerations, it is important to determine whether a realized data set, $x_{1}, \dots, x_{n}$ , from a non-negative random variable X with distribution function $F (x)$ , is well-described by a Type I Pareto distribution with parameters β and σ, denoted here by $P (β, σ)$ . In this paper we will therefore consider using a goodness-of-fit (GOF) test to evaluate the following hypotheses regarding these data: (1) $\begin{aligned} \begin{aligned} H_{0} : X follows a P (β, σ) distribution; \exists β, σ > 0 such that F (x) = F_{β, σ} (x), x \in [σ, \infty), \\ and \\ H_{1} : X does not follow a P (β, σ) distribution; ∄ β, σ > 0 such that F (x) = F_{β, σ} (x), \\ x \in [σ, \infty) . \end{aligned} \end{aligned}$ (1) Before discussing the existing goodness-of-fit tests for the Pareto distribution we briefly introduce the Pareto Types II, III, and IV distributions for completeness. If $X \sim P (β, σ)$ , then the random variable $Z = X + μ - σ$ has a Pareto Type II distribution, with CDF (2) $G_{β, σ, μ} (x) = 1 - {[1 + (\frac{x - μ}{σ})]}^{- β}, x \geq μ,$ (2) where $μ \in R$ is a location parameter. Setting $μ = 0$ in (Equation2(2) $G_{β, σ, μ} (x) = 1 - {[1 + (\frac{x - μ}{σ})]}^{- β}, x \geq μ,$ (2) ) we have a special case of the Pareto Type II distribution, sometimes called the Lomax distribution [Citation15], and often appears in a reparameterised form with $β = 1 / ξ$ and $σ = δ / ξ$ (see [Citation16] as well as Remark 1 of [Citation17]). The Pareto Type IV distribution includes a location parameter, $μ \in R$ , scale parameter, $σ > 0$ , inequality parameter, $γ > 0$ , and shape parameter, $β > 0$ , and has CDF (3) $H_{μ, σ, γ, α} (x) = 1 - {[1 + {(\frac{x - μ}{σ})}^{1 / γ}]}^{- β}, x \geq μ .$ (3) Setting $β = 0$ in (Equation3(3) $H_{μ, σ, γ, α} (x) = 1 - {[1 + {(\frac{x - μ}{σ})}^{1 / γ}]}^{- β}, x \geq μ .$ (3) ) results in the CDF of the Pareto Type III distribution. Note that the CDF of the Pareto Type I distribution can also be recovered from (Equation3(3) $H_{μ, σ, γ, α} (x) = 1 - {[1 + {(\frac{x - μ}{σ})}^{1 / γ}]}^{- β}, x \geq μ .$ (3) ) by setting $μ = σ$ and $γ = 1$ . By setting $γ = 1$ in (Equation3(3) $H_{μ, σ, γ, α} (x) = 1 - {[1 + {(\frac{x - μ}{σ})}^{1 / γ}]}^{- β}, x \geq μ .$ (3) ) the CDF of the Pareto Type II is obtained. For a full discussion on interesting properties of these distributions, as well as the relationships between the Pareto distribution and other distributions, the interested reader is referred to the monograph by Arnold [Citation8].

Several tests have been suggested to check the goodness-of-fit of Pareto distributions; the most commonly used formal goodness-of-fit tests for Pareto distributions are those based on the empirical distribution function (EDF), such as the Kolmogorov-Smirnov (KS) test, Cramér-von Mises (CvM) test, or Anderson-Darling (AD) test. These tests compare the empirical distribution of the data with the hypothesized theoretical Pareto distribution and assess the likelihood that the data were generated by a Pareto distribution. The results of these tests can be used to determine whether the Pareto distribution is a good fit for the data, or whether another distribution may be more appropriate. Goodness-of-fit tests for the Pareto distribution have been discussed in Beirlant et al. [Citation18], Gulati and Shapiro [Citation19], Martynov [Citation20], Rizzo [Citation21], and Falk et al. [Citation16], among others. In Chu et al. [Citation22] a review of established tests for the Generalized Pareto, Pareto Type I and Pareto Type II distributions is provided, whereas goodness-of-fit tests based on a variety of different characterizations for the Pareto distribution can be found in Obradovíc et al. [Citation23], Obradovíc [Citation24], Volkova [Citation25], and Miloševíc and Obradovíc [Citation26]. Ndwandwe et al. [Citation27] provides an extensive review of the existing goodness-of-fit tests for the Pareto Type I distribution, focussing on the myriad characterizations of this distribution. Although tests specifically developed for Pareto Types II, III, and IV distributions can potentially be used to test for the Type I distribution (by exploiting relationships between these distributions), these tests will not be considered in the Monte Carlo study presented in this paper. In what follows we will refer to the Pareto Type I distribution as just the Pareto distribution.

In this paper we propose new classes of tests for the Pareto distribution. These tests are based on characterization of the Pareto distribution involving order statistics. In Section 2 we consider the case of the Type I Pareto with unit scale parameter. We present the characterization, introduce the new test statistics and derive the limiting null distribution of the tests and show that they are consistent against fixed alternatives. Section 3 is devoted to a discussion on the general Type I Pareto distribution. In Section 4 we compare the powers of our newly proposed tests with some existing tests (in the case of the general Type I Pareto distribution), while Section 5 illustrates the use of the tests in order to test the hypothesis that the 2022 season's earnings of LIV golfers (exceeding some known threshold), follows a Pareto distribution. The paper concludes in Section 6.

2. The Type I Pareto distribution with unit scale parameter

In this section, we study the case of a Type I Pareto distribution with unit scale parameter, that is, $P (β, 1)$ , $β > 0$ .

2.1. The test statistic

Consider the following characterization of the Pareto distribution denoted here by $P (β, 1)$ , discussed in Allison et al. [Citation28].

Characterisation. Let $X_{1}, \dots, X_{n}$ be independent copies of a non-negative random variable X with common density function f and cumulative distribution function F. Let m be an integer such that $2 \leq m \leq n$ . Then the random variables $X^{\frac{1}{m}}$ and $X_{(1)} = min {X_{1}, \dots, X_{m}}$ have the same distribution if and only if $F (x) = F_{β} (x) = F_{β, 1} (x)$ , $x \in R$ , $β > 0$ .

From this characterization we have the following Theorem

Theorem 2.1

Let $X_{1}, \dots, X_{n}$ be copies of a non-negative random variable X with common density function f and cumulative distribution function F. Let m be an integer such that $2 \leq m \leq n$ . Then the random variables $X^{\frac{1}{m}}$ and $min {X_{1}, \dots, X_{m}}$ have the same distribution if and only if for any $t \in R$ , (4) $E {\frac{1}{m} \exp (- it X^{\frac{1}{m}}) - {[1 - F (X)]}^{m - 1} \exp (- itX)} = 0.$ (4)

Proof.

Let $2 \leq m \leq n$ . It is well known that $X_{(1)} = min {X_{1}, \dots, X_{m}}$ has density function $\tilde{f} (x) = m {[1 - F (x)]}^{m - 1} f (x), x \in R .$ It is then clear that the random variables $X^{\frac{1}{m}}$ and $X_{(1)}$ have the same distribution if and only if they have the same characteristic functions, that is, if and only if for any $t \in R$ , $\int_{R} \exp (- it x^{\frac{1}{m}}) f (x) d x = m \int_{R} \exp (- itx) {[1 - F (x)]}^{m - 1} f (x) d x .$ It is easy to see that the above equality is equivalent to $\int_{R} {\frac{1}{m} \exp (- it x^{\frac{1}{m}}) - \exp (- itx) {[1 - F (x)]}^{m - 1}} f (x) d x = 0,$ which can be written as $E {\frac{1}{m} \exp (- it X^{\frac{1}{m}}) - {[1 - F (X)]}^{m - 1} \exp (- itX)} = 0. ■$

Let $w (\cdot)$ be any continuous function satisfying (5) $\begin{aligned} w (t) > 0, lim_{t \to \pm \infty} w (t) = 0, t \in R, 0 < \int_{R} w (t) d t < \infty and \\ \int_{R} ζ (tx) w (x) d x = 0, t \in R, \end{aligned}$ (5) for any real-valued odd function ζ.

From the characterization and Theorem 2.1 we have that (Equation4(4) $E {\frac{1}{m} \exp (- it X^{\frac{1}{m}}) - {[1 - F (X)]}^{m - 1} \exp (- itX)} = 0.$ (4) ) characterizes the $P (β, 1)$ distribution. Thus, suitable normalizations of empirical versions of the expectation in (Equation4(4) $E {\frac{1}{m} \exp (- it X^{\frac{1}{m}}) - {[1 - F (X)]}^{m - 1} \exp (- itX)} = 0.$ (4) ) can be used as basis for the construction of tests for that particular Pareto distribution. To this end, we propose the following test statistic (6) $T_{m, n, w} = \int_{R} {| S_{m, n, {\hat{β}}_{n}} (t) |}^{2} w (t) d t,$ (6) where for all $t \in R$ $S_{m, n, β} (t) = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} [\frac{1}{m} \exp (- it X_{j}^{\frac{1}{m}}) - X_{j}^{- β (m - 1)} \exp (- it X_{j})],$ ${\hat{β}}_{n} = n / \sum_{j = 1}^{n} \log (X_{j})$ is the maximum likelihood estimator for β.

Proposition 2.1

Let $2 \leq m \leq n$ . Then $\int_{R} | S_{m, n, {\hat{β}}_{n}} (t) |^{2} w (t) d t = \int_{R} | S_{m, n, {\hat{β}}_{n}}^{†} (t) |^{2} w (t) d t,$ where for all $t \in R$ , (7) $S_{m, n, β}^{†} (t) = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} [\frac{1}{m} ν (X_{j}^{\frac{1}{m}}; t) - X_{j}^{- β (m - 1)} ν (X_{j}; t)],$ (7) and ν is the function defined on $R \times R$ by (8) $ν (x; t) = \cos (tx) + \sin (tx) .$ (8)

Proof.

Denote by $\bar{z}$ the conjugate of any complex number z. One has the following equalities: $\begin{aligned} \int_{R} | S_{m, n, {\hat{β}}_{n}} (t) |^{2} w (t) d t = \int_{R} S_{m, n, {\hat{β}}_{n}} (t) \bar{S_{m, n, {\hat{β}}_{n}} (t)} w (t) d t \\ = \frac{1}{n} \sum_{j = 1}^{n} \sum_{k = 1}^{n} \int_{R} {\frac{1}{m^{2}} \exp [- it (X_{j}^{\frac{1}{m}} - X_{k}^{\frac{1}{m}})] - \frac{1}{m} X_{k}^{- {\hat{β}}_{n} (m - 1)} \exp [- it (X_{j}^{\frac{1}{m}} - X_{k})] \\ - \frac{1}{m} X_{j}^{- {\hat{β}}_{n} (m - 1)} \exp [- it (X_{k}^{\frac{1}{m}} - X_{j})] + (X_{j} X_{k})^{- {\hat{β}}_{n} (m - 1)} \exp [- it (X_{j} - X_{k})]} w (t) d t . \end{aligned}$ By (Equation5(5) $\begin{aligned} w (t) > 0, lim_{t \to \pm \infty} w (t) = 0, t \in R, 0 < \int_{R} w (t) d t < \infty and \\ \int_{R} ζ (tx) w (x) d x = 0, t \in R, \end{aligned}$ (5) ), since $x \mapsto \sin (x)$ is an odd function, one has that $\begin{aligned} \int_{R} | S_{m, n, {\hat{β}}_{n}} (t) |^{2} w (t) d t \\ = \frac{1}{n} \sum_{j = 1}^{n} \sum_{k = 1}^{n} \int_{R} {\frac{1}{m^{2}} \cos [t (X_{j}^{\frac{1}{m}} - X_{k}^{\frac{1}{m}})] - \frac{1}{m} X_{k}^{- {\hat{β}}_{n} (m - 1)} \cos [t (X_{j}^{\frac{1}{m}} - X_{k})] \\ - \frac{1}{m} X_{j}^{- {\hat{β}}_{n} (m - 1)} \cos [t (X_{k}^{\frac{1}{m}} - X_{j})] + (X_{j} X_{k})^{- {\hat{β}}_{n} (m - 1)} \cos [t (X_{j} - X_{k})]} w (t) d t . \end{aligned}$ Using the identity $\cos (a - b) = \cos (a) \cos (b) + \sin (a) \sin (b)$ and the fact that the function $x \mapsto \cos (x) \sin (x)$ is odd, one finally has: $\int_{R} | S_{m, n, {\hat{β}}_{n}} (t) |^{2} w (t) d t = \int_{R} | S_{m, n, {\hat{β}}_{n}}^{†} (t) |^{2} w (t) d t . ■$

For practical applications (and for the Monte-Carlo study in Section 4) we will choose $w (t) = e^{- a | t |}$ and $w (t) = e^{- a t^{2}}$ , the choices of which lead to the following calculable forms of the test statistic: $\begin{aligned} T_{n, m, a}^{(1)} & = \frac{1}{n} \sum_{j = 1}^{n} \sum_{k = 1}^{n} [\frac{1}{m^{2}} \frac{2 a}{a^{2} + (X_{j}^{\frac{1}{m}} - X_{k}^{\frac{1}{m}})^{2}} - \frac{1}{m} X_{k}^{- {\hat{β}}_{n} (m - 1)} \frac{2 a}{a^{2} + (X_{j}^{\frac{1}{m}} - X_{k})^{2}} \\ - \frac{1}{m} X_{j}^{- {\hat{β}}_{n} (m - 1)} \frac{2 a}{a^{2} + (X_{k}^{\frac{1}{m}} - X_{j})^{2}} + X_{j}^{- {\hat{β}}_{n} (m - 1)} X_{k}^{- {\hat{β}}_{n} (m - 1)} \frac{2 a}{a^{2} + (X_{j} - X_{k})^{2}}] . \end{aligned}$ and $\begin{aligned} T_{n, m, a}^{(2)} & = \frac{1}{n} \sqrt{\frac{π}{a}} \sum_{j = 1}^{n} \sum_{k = 1}^{n} [\frac{1}{m^{2}} \exp (- \frac{{(X_{j}^{\frac{1}{m}} - {X_{k}}^{\frac{1}{m}})}^{2}}{4 a}) \\ - \frac{1}{m} X_{k}^{- {\hat{β}}_{n} (m - 1)} \exp (\frac{- {(X_{j}^{\frac{1}{m}} - X_{k})}^{2}}{4 a}) \\ - \frac{1}{m} X_{j}^{- {\hat{β}}_{n} (m - 1)} \exp (\frac{- {(X_{k}^{\frac{1}{m}} - X_{j})}^{2}}{4 a}) \\ + X_{j}^{- {\hat{β}}_{n} (m - 1)} X_{k}^{- {\hat{β}}_{n} (m - 1)} \exp (\frac{- {(X_{j} - X_{k})}^{2}}{4 a})] \end{aligned}$ respectively.

2.2. Large sample properties

In this section we study the asymptotic properties of the newly proposed tests under the null hypothesis as well as under fixed alternatives. It is well known that under $H_{0}$ , the maximum likelihood estimator of β is ${\hat{β}}_{n} = n / \sum_{i = 1}^{n} \log (X_{i})$ and $E ({\hat{β}}_{n}) = nβ / (n - 1)$ .

From this, under $H_{0}$ , one can deduce the following equalities: (9) $\begin{aligned} \sqrt{n} ({\hat{β}}_{n} - β) & = \sqrt{n} (\frac{n}{\sum_{j = 1}^{n} \log (X_{j})} - β) \\ = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} [\frac{1}{β} - \log (X_{j})] (β^{2} + \frac{βn}{\sum_{j = 1}^{n} \log (X_{j})} - β^{2}) \\ = \frac{β^{2}}{\sqrt{n}} \sum_{j = 1}^{n} [\frac{1}{β} - \log (X_{j})] + \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} [\frac{1}{β} - \log (X_{j})] (\frac{βn}{\sum_{j = 1}^{n} \log (X_{j})} - β^{2}) \end{aligned}$ (9) Now, recall that under $H_{0}$ , for any $j = 1, \dots, n$ , $\log (X_{j})$ follows a gamma distribution with parameters 1 and β, $\sum_{j = 1}^{n} \log (X_{j})$ follows a gamma distribution with parameters n and β and $1 / \sum_{j = 1}^{n} \log (X_{j})$ follows an inverse gamma distribution with parameters n and β. From this, by the Strong Law of Large Numbers (SLLN) and Slutsky theorem, one has the following almost surely convergence: $\frac{βn}{\sum_{j = 1}^{n} \log (X_{j})} ⟶ β^{2} ⟺ \frac{βn}{\sum_{j = 1}^{n} \log (X_{j})} - β^{2} ⟶ 0.$ Next, by the Central Limit Theorem, the following convergence in distribution holds $\frac{1}{\sqrt{n}} \sum_{j = 1}^{n} [\frac{1}{β} - \log (X_{j})] ⟹ N,$ where N is a zero-mean Gaussian random variable with variance $1 / β^{2}$ .

Collecting these two convergence results, one sees that the second term in (Equation9(9) $\begin{aligned} \sqrt{n} ({\hat{β}}_{n} - β) & = \sqrt{n} (\frac{n}{\sum_{j = 1}^{n} \log (X_{j})} - β) \\ = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} [\frac{1}{β} - \log (X_{j})] (β^{2} + \frac{βn}{\sum_{j = 1}^{n} \log (X_{j})} - β^{2}) \\ = \frac{β^{2}}{\sqrt{n}} \sum_{j = 1}^{n} [\frac{1}{β} - \log (X_{j})] + \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} [\frac{1}{β} - \log (X_{j})] (\frac{βn}{\sum_{j = 1}^{n} \log (X_{j})} - β^{2}) \end{aligned}$ (9) ) is $o_{P} (1)$ .

Denote by $C = C (R, R)$ , the set of $R$ -valued continuous functions defined on $R$ . Define on $C$ the metric $ρ (x, y) = \sum_{j = 1}^{\infty} 2^{- j} \frac{ρ_{j} (x, y)}{1 + ρ_{j} (x, y)}, \forall j \geq 1, ρ_{j} (x, y) = sup_{‖ w ‖ \leq j} | x (w) - y (w) | .$ It is well known (see, for example, [Citation29]) that endowed with ρ, $C$ is a separable Fréchet space, and that convergence in this metric corresponds to the uniform convergence on all compact sets. That is for all $x, y \in C$ , $ρ (x, y) = 0 ⟺ \forall j \geq 1, ρ_{j} (x, y) = 0$ . For random elements $x_{n}$ and $y_{n}$ of $C$ , $ρ (x_{n}, y_{n}) \overset{P}{⟶} 0 ⟺ \forall j \geq 1, ρ_{j} (x_{n}, y_{n}) \overset{P}{⟶} 0$ .

Proposition 2.2

Let $2 \leq m \leq n$ . Under $H_{0}$ , in $C$ , as n tends to infinity, in probability, $S_{m, n, {\hat{β}}_{n}}^{†} (\cdot) = {\tilde{S}}_{m, n, β} (\cdot) + o_{P} (1),$ and for all $t \in R$ , (10) ${\tilde{S}}_{m, n, β} (t) = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} {\frac{1}{m} ν (X_{j}^{\frac{1}{m}}; t) - X_{j}^{- β (m - 1)} ν (X_{j}; t) + [\frac{1}{β} - \log (X_{j})] φ (t)},$ (10) with φ standing for the function defined for any $t \in R$ by: (11) $φ (t) = (m - 1) β^{4} \int_{1}^{\infty} x^{- βm - 2} ν (x; t) d x .$ (11)

Proof.

Write for all $t \in R$ , $S_{m, n, {\hat{β}}_{n}}^{†} (t) = S_{m, n, β}^{†} (t) + {\hat{S}}_{m, n} (t),$ where ${\hat{S}}_{m, n} (t) = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} (X_{j}^{- β (m - 1)} - X_{j}^{- {\hat{β}}_{n} (m - 1)}) ν (X_{j}; t) .$ Now, from a first-order Taylor expansion, one has $X_{j}^{- β (m - 1)} - X_{j}^{- {\hat{β}}_{n} (m - 1)} = ({\hat{β}}_{n} - β) (m - 1) β X_{j}^{- β (m - 1) - 1} + o_{P} (1) .$ Then, under $H_{0}$ , for all $t \in R$ , one has the equalities: $\begin{aligned} {\hat{S}}_{m, n} (t) & = ({\hat{β}}_{n} - β) (m - 1) β \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} X_{j}^{- β (m - 1) - 1} ν (X_{j}; t) + o_{P} (1), \\ = \sqrt{n} ({\hat{β}}_{n} - β) [\frac{(m - 1) β}{n} \sum_{j = 1}^{n} X_{j}^{- β (m - 1) - 1} ν (X_{j}; t)] + o_{P} (1) . \end{aligned}$ For all $t \in R$ , define the term in the brackets as $φ_{n} (t)$ . By the law of large numbers, it is easy to see that $φ_{n} (t)$ converges point-wise to $φ (t)$ and that it is equicontinuous on every compact subset of $R$ . Therefore, it converges uniformly to $φ (t)$ on any compact subset Θ of $R$ . This result could also be obtained by applying Proposition 1 of Csörgö [Citation30].

As a consequence of the above convergence, uniformly in $t \in Θ$ , ${\hat{S}}_{m, n} (t) = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} [\frac{1}{β} - \log (X_{j})] φ (t) + o_{P} (1)$ and uniformly in $t \in Θ$ , $\begin{aligned} S_{m, n, {\hat{β}}_{n}}^{†} (t) & = S_{m, n, β}^{†} (t) + {\hat{S}}_{m, n} (t) \\ = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} {\frac{1}{m} ν (X_{j}^{\frac{1}{m}}; t) - X_{j}^{- β (m - 1)} ν (X_{j}; t) + [\frac{1}{β} - \log (X_{j})] φ (t)} + o_{P} (1) \\ = {\tilde{S}}_{m, n, β} + o_{P} (1) . \end{aligned}$ As this holds for arbitrary compact Θ, one can conclude that as n tends to infinity, in probability, $ρ ({\tilde{S}}_{m, n, β}, S_{m, n, {\hat{β}}_{n}}^{†}) ⟶ 0,$ which establishes the proposition.

Now, let k be the real-valued function defined for any $(x, t) \in R \times R$ by $k (x, t) = \frac{1}{m} ν (x^{\frac{1}{m}}; t) - x^{- β (m - 1)} ν (x; t) + [\frac{1}{β} - \log (x)] φ (t) .$ Also consider the function K defined for any $t \in R$ by $K (t) = \int_{R} k (x, t) d F (x),$ and $F_{n}$ the empirical cumulative distribution function of $X_{1}, X_{2}, \dots, X_{n}$ . Then one sees that $ϖ_{n} (t) = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} [k (X_{j}, t) - K (t)] = \int_{R} k (x, t) d {\sqrt{n} [F_{n} (x, t) - F (t)]} .$ Under suitable conditions (see [Citation30]), $ϖ_{n}$ converges weakly to a zero-mean Gaussian process with covariance kernel $E [ϖ_{n} (t) ϖ_{n} (s)] = \int_{R} k (x, t) k (x, s) d F (x) - K (t) K (s) .$

Theorem 2.2

Let $m \in {2, \dots, n}$ , fixed. If $β > 1 / m$ , then under $H_{0}$ , ${\tilde{S}}_{m, n, β} (\cdot)$ converges weakly in $C$ to a zero-mean Gaussian process $S_{m} (\cdot)$ with covariance kernel $Γ_{m}$ defined for any $s, t \in R$ by (12) $\begin{aligned} Γ_{m} (s, t) & = β \int_{1}^{\infty} {\frac{1}{m^{2}} ν (x^{\frac{1}{m}}; t) ν (x^{\frac{1}{m}}; s) + x^{- 2 β (m - 1)} ν (x; t) ν (x; s) \\ + {(\frac{1}{β} - \log (x))}^{2} φ (t) φ (s) - \frac{1}{m} x^{- β (m - 1)} ν (t; s) ν (x^{\frac{1}{m}}; s) \\ + \frac{1}{m} {(\frac{1}{β} - \log (x))}^{2} ν (x^{\frac{1}{m}}; t) φ (s) - \frac{1}{m} x^{- β (m - 1)} ν (x; t) ν (x^{\frac{1}{m}}; s) \\ - x^{- β (m - 1)} (\frac{1}{β} - \log (x)) ν (x; t) φ (s) + \frac{1}{m} (\frac{1}{β} - \log (x)) φ (t) ν (x^{\frac{1}{m}}; s) \\ - x^{- β (m - 1)} (\frac{1}{β} - \log (x)) φ (t) ν (x; s)} x^{- β - 1} d x . \end{aligned}$ (12)

Proof.

We prove this result by showing that ${\tilde{S}}_{m, n, β} (\cdot)$ is tight and its finite-dimensional distributions converge to those of any zero-mean Gaussian process with covariance kernel $Γ_{m}$ . For this, we check the conditions (i), (i)* and (ii)* of Csörgö [Citation30].

As $k (x, t)$ is bounded with respect to t on any compact subset of $R$ , so is $| k (x, t) |^{2 + δ}$ , for any $δ > 0$ . Thus, one can find $t_{0} \in Θ$ such that for any $x \geq 1$ $sup_{t \in Θ} | k (x, t) |^{2 + δ} = | k (x, t_{0}) |^{2 + δ} .$ Consequently, $\begin{aligned} \int_{R} sup_{t \in Θ} | k (x, t) |^{2 + δ} d F (x) & = \int_{R} | k (x, t_{0}) |^{2 + δ} d F (x) \\ \leq Cst \int_{R} [1 + x^{- (2 + δ) β (m - 1)} + {(\frac{1}{β} + \log (x))}^{2 + δ}] d F (x) \\ < \infty . \end{aligned}$ This establishes the convergence of the finite-dimensional distributions of ${\tilde{S}}_{m, n, β}$ and the point (i)* of Csörgö [Citation30].

It remains to show (ii)*. One can write: $\begin{aligned} | k (x, t) - k (x, s) | & \leq \frac{1}{m} | ν (x^{\frac{1}{m}}; t) - ν (x^{\frac{1}{m}}; s) | + x^{- β (m - 1)} | ν (x; t) - ν (x; s) | \\ + | \frac{1}{β} - \log (x) | | φ (t) - φ (s) | . \end{aligned}$ By a first-order Taylor expansion of the function $t \mapsto ν (x; t) = \cos (xt) + \sin (xt)$ , one has: $| k (x, t) - k (x, s) | \leq \frac{x^{\frac{1}{m}}}{m} | t - s | + x^{- β (m - 1) + 1} | t - s | + Cst | \frac{1}{β} - \log (x) | | t - s | .$ Then, one easily sees that for any $s, t \in Θ$ , one can find $α \in (0, 1]$ such that $\begin{aligned} | k (x, t) - k (x, s) | & \leq | t - s |^{α} Cst (\frac{x^{\frac{1}{m}}}{m} + x^{- β (m - 1) + 1} + | \frac{1}{β} - \log (x) |) \\ = | t - s |^{α} M (x, v (t, s)), \end{aligned}$ where v is any Θ-valued function defined on $Θ \times Θ$ , and for any $x \geq 1$ and $t \in Θ$ , M stands for the function defined as $M (x, t) = Cst (\frac{x^{\frac{1}{m}}}{m} + x^{- β (m - 1) + 1} + | \frac{1}{β} - \log (x) |) .$ Since $M (x, t)$ does not depend on t, $sup_{t \in Θ} M^{2} (x; t) = M^{2} (x; t)$ and by our assumptions $\int_{R} sup_{t \in Θ} M^{2} (x; t) d F (x) = \int_{R} M^{2} (x; t) d F (x) < \infty .$ From all these, by Csörgö [Citation30], one can conclude that $\int_{R} k (x, t) d {\sqrt{n} [F_{n} (x, t) - F (t)]}$ converges weakly to any zero-mean Gaussian process with covariance kernel $\int_{R} k (x, t) k (x, s) d F (x) - K (s) K (t) .$ From the following equality (13) $\int_{R} k (x, t) d {\sqrt{n} [F_{n} (x, t) - F (t)]} = {\tilde{S}}_{m, n, β} (t) - \sqrt{n} K (t),$ (13) since under $H_{0}$ , $K (t) = 0$ , ${\tilde{S}}_{m, n, β} (\cdot)$ converges weakly to the Gaussian process invoked in the theorem. This establishes the theorem.

Theorem 2.3

Let $m \in {2, \dots, n}$ , fixed. Assume that under $H_{1}$ , $E [\log (X_{1})] < \infty$ . Then under $H_{1}$ , in probability, for any $t \in R$ , $S_{m, n, {\hat{β}}_{n}} (t)$ has the same asymptotic behaviour as $\sqrt{n} Q (t)$ , where $\begin{aligned} Q (t) & = E [\frac{1}{m} \exp (- it X_{1}^{\frac{1}{m}}) - [1 - F (X_{1})]^{(m - 1)} \exp (- it X_{1})] \\ + E {[X_{1}^{- \frac{(m - 1)}{E [\log (X_{1})]}} - [1 - F (X_{1})]^{(m - 1)}] \exp (- it X_{1})}, t \in R . \end{aligned}$

Proof.

Define, for any $t \in R$ , ${\dot{S}}_{m, n, F} = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} [\frac{1}{m} \exp (- it X_{j}^{\frac{1}{m}}) - [1 - F (X_{j})]^{(m - 1)} \exp (- it X_{j})] .$ Now, adding and subtracting one has $\frac{S_{m, n, {\hat{β}}_{n}} (t)}{\sqrt{n}} = \frac{{\dot{S}}_{m, n, F} (t)}{\sqrt{n}} + \frac{1}{n} \sum_{j = 1}^{n} {X_{j}^{- \hat{β} (m - 1)} - [1 - F (X_{j})]^{(m - 1)}} \exp (- it X_{j}) + o_{P} (1) .$ Then by the SLLN the first and second terms in the right-hand side of the above equation converge point-wise respectively to $Q_{1} (t) = E [\frac{1}{m} \exp (- it X_{1}^{\frac{1}{m}}) - [1 - F (X_{1})]^{(m - 1)} \exp (- it X_{1})]$ and $Q_{2} (t) = E {[X_{1}^{- \frac{(m - 1)}{E [\log (X_{1})]}} - [1 - F (X_{1})]^{(m - 1)}] \exp (- it X_{1})} .$ This establishes the theorem.

Theorem 2.4

Let w be any function satisfying (Equation5(5) $\begin{aligned} w (t) > 0, lim_{t \to \pm \infty} w (t) = 0, t \in R, 0 < \int_{R} w (t) d t < \infty and \\ \int_{R} ζ (tx) w (x) d x = 0, t \in R, \end{aligned}$ (5) ). Let $m \in {2, \dots, n}$ , fixed.

(i)	If $β > 1 / m$ , then, under $H_{0}$ , as n tends to infinity, in distribution $T_{m, n, w} ⟶ \int_{R} S_{m}^{2} (t) w (t) d t,$ where $S_{m}$ is the Gaussian process invoked in Theorem 2.2.
(ii)	Under $H_{1}$ , if $E [\log (X_{1})] < \infty$ , then as n tends to infinity, $T_{m, n, w} ⟶ \infty .$

Proof.

For Part (i), first observe that since the $X_{i}$ 's are iid, under $H_{0}$ , one has by simple computations $E [{\tilde{S}}_{m, n, β}^{2} (t)] = E {\frac{1}{m} ν (X_{1}^{\frac{1}{m}}; t) - X_{1}^{- β (m - 1)} ν (X_{1}; t) + [\frac{1}{β} - \log (X_{1})] φ (t)}^{2} .$ Denote by $B_{r} \subset R$ a ball of radius r, and ${\bar{B}}_{r}$ its complementary in $R$ . Integrating both sides of the above equality with respect to $w (t) d t$ on ${\bar{B}}_{r}$ , one has: $\begin{aligned} \int_{{\bar{B}}_{r}} E [{\tilde{S}}_{m, n, β}^{2} (t)] w (t) d t \\ = \int_{{\bar{B}}_{r}} E {\frac{1}{m} ν (X_{1}^{\frac{1}{m}}; t) - X_{1}^{- β (m - 1)} ν (X_{1}; t) + [\frac{1}{β} - \log (X_{1})] φ (t)}^{2} w (t) d t . \end{aligned}$ Since the functions $t \mapsto ν (x, t)$ and $t \mapsto φ (t)$ are bounded, since $w (t) ⟶ 0$ as t tends to infinity, it is easy to see that as r tends to infinity, the right-hand side of the last equality converges to 0. From an adaption of Theorem 2.3 of Bilodeau and Lafaye de Micheaux [Citation29] with $f (x) = x^{2}$ and $α = 1$ , one has that, under $H_{0}$ , as n tends to infinity, $T_{m, n, w} ⟶ \int_{R} S_{m}^{2} (t) w (t) d t .$ For the proof of the second part, it follows easily from Theorem 2.3 that for larger values of n, $T_{m, n, w} = n \int_{R} | Q (t) |^{2} w (t) d t .$ Under $H_{1}$ , there exists a $t_{1} \in R$ , such that $Q_{1} (t_{1}) \neq 0$ . The quantity $m Q_{2} (t)$ is the difference between the characteristic function of $X_{(1)} = min {X_{1}, \dots, X_{m}}$ under $H_{0} (F = F_{1 / E [\log (X_{1})]})$ and its characteristic function under $H_{1}$ . Since $F \neq F_{1 / E [\log (X_{1})]}$ under $H_{1}$ , $Q_{2} (t_{2}) \neq 0$ for some $t_{2} \in R$ . This means that under $H_{1}$ , there is some $t_{0} \in R$ for which $Q (t_{0}) \neq 0$ . By the continuity of $| Q |$ , $\int_{R} | Q (t) |^{2} w (t) d t > 0$ . Whence, under $H_{1}$ , as n tends to infinity, in probability, $T_{m, n, w} ⟶ \infty . ■$

Now, assume that $w (\cdot)$ is the density function (with respect to the Lebesgue's measure) of some positive measure μ with support $R$ . Let $L_{2} = L_{2} (μ)$ be the collection of functions g defined on $R$ such that $\int_{R} g^{2} (t) d μ (t) < \infty$ . For $h_{1}, h_{2}, h \in L_{2}$ , $⟨ h_{1}, h_{2} ⟩ = \int_{R} h_{1} (t) h_{2} (t) dμ (t)$ and $‖ h ‖_{L_{2}} = ⟨ h, h ⟩^{\frac{1}{2}}$ respectively stand for the usual inner product and norm on $L_{2}$ .

From our assumptions, it is easy to prove that the function $Γ_{m} (s, t)$ defined by (Equation20(20) $\begin{aligned} Λ_{m} (s, t) = β \int_{1}^{\infty} (\frac{1}{m^{2}} ν (x^{\frac{1}{m}}; t) ν (x^{\frac{1}{m}}; s) + x^{- 2 β (m - 1)} ν (x; t) ν (x; s) \\ + {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (t) + I (x \leq 1) ψ (t)} \\ \times {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (s) + I (x \leq 1) ψ (s)} \\ - \frac{1}{m} x^{- β (m - 1)} ν (t; s) ν (x^{\frac{1}{m}}; s) \\ + \frac{1}{m} {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (s) + I (x \leq 1) ψ (s)} ν (x^{\frac{1}{m}}; t) \\ - \frac{1}{m} x^{- β (m - 1)} ν (x; t) ν (x^{\frac{1}{m}}; s) \\ - x^{- β (m - 1)} ν (x; t) {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (s) + I (x \leq 1) ψ (s)} \\ + \frac{1}{m} ν (x^{\frac{1}{m}}; s) {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (t) + I (x \leq 1) ψ (t)} \\ - x^{- β (m - 1)} ν (x; s) {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (t) + I (x \leq 1) ψ (t)}) x^{- β - 1} d x . \end{aligned}$ (20) ) is a positive semidefinite kernel. Consequently, the integral operator $\nabla_{Γ_{m}}$ defined on $L_{2}$ by (14) $\nabla_{Γ_{m}} h (t) = \int_{R} Γ_{m} (s, t) h (s) d μ (s), t \in R .$ (14) admits eigenvalues $ξ_{1}, ξ_{2}, \dots$ sorted so that $ξ_{1} \geq ξ_{2} \geq \dots \geq 0$ , and eigenfunctions $g_{1}, g_{2}, \dots$ which form an orthonormal basis for $L_{2}$ .

Corollary 2.1

Under the conditions of Theorem 2.4, under $H_{0}$ , $T_{m, n, w}$ has asymptotically the same distribution as $\sum_{j \geq 1} ξ_{j} χ_{j}^{2}$ , where $ξ_{j}$ and $χ_{j}^{2}, j \geq 1$ , are respectively the eigenvalues of $\nabla_{Γ_{m}}$ and i.i.d. random variables following a chi–squared distribution with one degree of freedom.

Proof.

The Gaussian process $S_{m} (\cdot)$ defined in Theorem 2.2 can be viewed as a random element of $L_{2}$ . Its Karhunen-Loève representation is given by $S_{m} (t) = \sum_{j = 1}^{\infty} G_{j} f_{j} (t), t \in R,$ where for all $j \geq 1$ , $G_{j} = ⟨ S_{m} (\cdot), f_{j} ⟩$ are independent zero–mean Gaussian random variables with variances $ξ_{j}$ . Thus, $‖ S_{m} (\cdot) ‖_{L_{2}}^{2} = \sum_{j = 1}^{\infty} G_{j}^{2}$ . Recalling that $E (G_{j}^{2}) = ξ_{j} \geq 0$ , $j \geq 1$ , for nil $ξ_{j}$ 's, the corresponding $G_{j}$ 's are nil in probability. For positive $ξ_{j}$ 's, one can observe that $Z_{j} = G_{j} / \sqrt{ξ_{j}}, j \geq 1$ , are iid standard Gaussian random variables. Thus, $\int_{R} S_{m}^{2} (t) d μ (t) = ‖ S_{m} (\cdot) ‖_{L_{2}}^{2} = \sum_{j = 1}^{\infty} ξ_{j} Z_{j}^{2} . ■$

One can approximate the distribution of $\sum_{j = 1}^{\infty} ξ_{j} χ_{j}^{2}$ by that of $\sum_{j = 1}^{J} ξ_{j} χ_{j}^{2}$ for any integer J large enough. Since the $ξ_{j}$ 's are unknown, they can be estimated by the eigenvalues of the operator $\nabla_{{\hat{Γ}}_{m, n}} (t)$ , where ${\hat{Γ}}_{m} (t)$ is any consistent estimator of $Γ_{m} (t)$ . A possible way is to estimate them by the ${\hat{ξ}}_{j}$ 's from the integral equations $\nabla_{{\hat{Γ}}_{m, n}} {\hat{f}}_{j} = {\hat{ξ}}_{j} {\hat{f}}_{j}, j \geq 1.$ A natural estimator of $Γ_{m} (s, t)$ can be obtained by taking the empirical counterpart in the expression given in (Equation20(20) $\begin{aligned} Λ_{m} (s, t) = β \int_{1}^{\infty} (\frac{1}{m^{2}} ν (x^{\frac{1}{m}}; t) ν (x^{\frac{1}{m}}; s) + x^{- 2 β (m - 1)} ν (x; t) ν (x; s) \\ + {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (t) + I (x \leq 1) ψ (t)} \\ \times {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (s) + I (x \leq 1) ψ (s)} \\ - \frac{1}{m} x^{- β (m - 1)} ν (t; s) ν (x^{\frac{1}{m}}; s) \\ + \frac{1}{m} {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (s) + I (x \leq 1) ψ (s)} ν (x^{\frac{1}{m}}; t) \\ - \frac{1}{m} x^{- β (m - 1)} ν (x; t) ν (x^{\frac{1}{m}}; s) \\ - x^{- β (m - 1)} ν (x; t) {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (s) + I (x \leq 1) ψ (s)} \\ + \frac{1}{m} ν (x^{\frac{1}{m}}; s) {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (t) + I (x \leq 1) ψ (t)} \\ - x^{- β (m - 1)} ν (x; s) {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (t) + I (x \leq 1) ψ (t)}) x^{- β - 1} d x . \end{aligned}$ (20) ), in which β is replaced by ${\hat{β}}_{n}$ . Some indications on the computation of the cumulative distribution function of $\sum_{j = 1}^{J} {\hat{ξ}}_{j} χ_{j}^{2}$ can be found in Ngatchou-Wandji [Citation31] or in Fan et al. [Citation32]. We will not pursue this further in this paper.

3. The case of the general Type I Pareto distribution

In this section, we indicate how to treat the case where the observations follow a more general $P (β, σ)$ distribution. That is, we consider testing the following more general hypotheses: $\begin{aligned} H_{0} : \exists β, σ > 0 such that F (x) = F_{β, σ} (x), x \in R \end{aligned}$ and $\begin{aligned} H_{1} : ∄ β, σ > 0 such that F (x) = F_{β, σ} (x), x \in R, \end{aligned}$ where we recall that $F_{β, σ} (x) = {\begin{cases} 1 - {(\frac{x}{σ})}^{- β}, & x \geq σ \\ 0, & x < σ \end{cases}$

Remark 3.1

The above testing problem can be related to that of the preceding sections by using the fact that a non-negative random variable X follows a $P (β, σ)$ distribution if and only if the scaled random variable $X / σ$ follows a $P (β, 1)$ distribution. This can be seen easily by observing that:

if $X \sim P (β, σ)$ then for any $x \geq 1$ , $P (\frac{X}{σ} \leq x) = P (X \leq σx) = 1 - x^{- β}$
if $X / σ \sim P (β, 1)$ then for any $x \geq σ$ , $P (X \leq x) = P (\frac{X}{σ} \leq \frac{x}{σ}) = 1 - {(\frac{x}{σ})}^{- β} .$

Now, let $Y_{1}, \dots, Y_{n}$ be an independent and identically distributed sample following a $P (β, σ)$ distribution. In the case where σ is known, by Remark 3.1, the current testing problem can be done as the one studied in Section 2, by considering the scaled observations $X_{j} = Y_{j} / σ$ , $j = 1, \dots, n$ .

With this, all the results as those obtained for $σ = 1$ can be established.

The null distribution of the test statistic can be computed along the same lines as at the end of Section 2.

Also, the consistency of the test can be handled by establishing a result similar to Theorem 2.3 and another similar to the second part of Theorem 2.4.

The case where σ is unknown is more interesting and the one encountered in practice. Here, it is natural to consider the scaled observations $Y_{j} / {\hat{σ}}_{n}$ , $j = 1, \dots, n$ , where ${\hat{σ}}_{n}$ is a consistent estimator of σ. In the sequel, we use the maximum likelihood estimators of the parameters of σ and β given by ${\hat{σ}}_{n} = min {Y_{1}, \dots, Y_{n}} and {\hat{β}}_{n} = \frac{n}{\sum_{i = 1}^{n} \log (\frac{Y_{j}}{{\hat{σ}}_{n}})} .$ Letting $X_{j} = Y_{j} / σ$ and observing that $F_{β, σ} (σ) = 0$ and $f_{β, σ} (σ) = β / σ$ , from the Bahadur representation of sample quantiles, one can write $\sqrt{n} (\hat{σ} - σ) = - \sqrt{n} \frac{F_{n} (σ)}{f_{β, σ} (σ)} + o_{P} (1) = - \frac{σ}{β} \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} I (X_{j} \leq 1) + o_{P} (1),$ where we recall that $I (\cdot)$ is the indicator function and here $F_{n} (\cdot)$ is the empirical cumulative distribution function of the $Y_{1}, \dots, Y_{n}$ .

Next, by a Taylor expansion (the delta method), one has $\sqrt{n} [(\log ({\hat{σ}}_{n}) - \log (σ))] = - \frac{1}{β} \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} I (X_{j} \leq 1) + o_{P} (1) .$ Also, by the Bahadur representation of ${\hat{σ}}_{n}$ , by its consistency and the Slutsky theorem, the Bahadur representation of ${\hat{β}}_{n}$ is given by: (15) $\begin{aligned} \sqrt{n} ({\hat{β}}_{n} - β) = \frac{nβ}{\sum_{j = 1}^{n} \log (\frac{Y_{j}}{{\hat{σ}}_{n}})} {\sqrt{n} [\frac{1}{β} - \frac{\sum_{j = 1}^{n} \log (\frac{Y_{j}}{{\hat{σ}}_{n}})}{n}]} \\ = \frac{nβ}{\log (σ) - \log ({\hat{σ}}_{n}) + \sum_{j = 1}^{n} \log (X_{j})} \\ \times {\sqrt{n} [\frac{1}{β} - \frac{\sum_{j = 1}^{n} \log (X_{j})}{n} - \log (σ) + \log ({\hat{σ}}_{n})]} \\ = \frac{β^{2}}{\sqrt{n}} \sum_{j = 1}^{n} [\frac{1}{β} - \log (X_{j}) - \frac{1}{β} I (X_{j} \leq 1)] + o_{P} (1), \end{aligned}$ (15) Now, the test static is (16) $T_{m, n, w} = \int_{R} {| W_{m, n, {\hat{β}}_{n}, {\hat{σ}}_{n}} (t) |}^{2} w (t) d t,$ (16) where for all $t \in R$ $W_{m, n, β, σ} (t) = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} [\frac{1}{m} \exp (- it X_{j}^{\frac{1}{m}}) - X_{j}^{- β (m - 1)} \exp (- it X_{j})] .$ Then, it is easy to prove along the same lines as for Proposition 2.1 that, given an integer $m \in [2, n]$ , one has $\int_{R} | W_{m, n, {\hat{β}}_{n}, \hat{σ}} (t) |^{2} w (t) d t = \int_{R} | W_{m, n, {\hat{β}}_{n}, \hat{σ}}^{†} (t) |^{2} w (t) d t,$ where for all $t \in R$ , (17) $W_{m, n, β, σ}^{†} (t) = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} {\frac{1}{m} ν (X_{j}^{\frac{1}{m}}; t) - X_{j}^{- β (m - 1)} ν (X_{j}; t)},$ (17) and $ν (\cdot)$ is the function defined in Proposition 2.1.

For studying the asymptotic distribution of $W_{m, n, {\hat{β}}_{n}, {\hat{σ}}_{n}}^{†} (t)$ , one has to establish the corresponding versions of Proposition 2.2 and Theorem 2.2.

Proposition 3.1

Let $2 \leq m \leq n$ . Under $H_{0}$ , in $C$ , as n tends to infinity, in probability, $W_{m, n, {\hat{β}}_{n}, {\hat{σ}}_{n}}^{†} (\cdot) = {\tilde{W}}_{m, n, β, σ} (\cdot) + o_{P} (1),$ where for all $t \in R$ , (18) $\begin{aligned} {\tilde{W}}_{m, n, β, σ} (t) & = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} {\frac{1}{m} ν (X_{j}^{\frac{1}{m}}; t) - X_{j}^{- β (m - 1)} ν (X_{j}; t) \\ + [\frac{1}{β} - \log (X_{j}) - \frac{1}{β} I (X_{j} \leq 1)] φ (t) + I (X_{j} \leq 1) ψ (t)} \end{aligned}$ (18) with φ standing for the function defined by (Equation11(11) $φ (t) = (m - 1) β^{4} \int_{1}^{\infty} x^{- βm - 2} ν (x; t) d x .$ (11) ) and the function ψ is defined for any $t \in R$ by (19) $ψ (t) = \frac{t}{m^{2}} \int_{1}^{\infty} x^{\frac{1}{m} - β - 1} ϑ (x^{\frac{1}{m}}; t) d x,$ (19) and $ϑ (x, t)$ stands for the function $ϑ (x, t) = \cos (xt) - \sin (xt)$ .

Proof.

Write for all $t \in R$ , $W_{m, n, {\hat{β}}_{n}, {\hat{σ}}_{n}} (t) = W_{m, n, β, σ} (t) + {\hat{W}}_{m, n} (t),$ where $\begin{aligned} {\hat{W}}_{m, n} (t) & = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} [\frac{1}{m} {ν [{(\frac{Y_{j}}{{\hat{σ}}_{n}})}^{\frac{1}{m}}; t] - ν [X_{j}^{\frac{1}{m}}; t]} \\ + (X_{j}^{- β (m - 1)} - X_{j}^{- {\hat{β}}_{n} (m - 1)}) ν (X_{j}; t)] . \end{aligned}$ Now, from a first-order Taylor expansion, one has $ν [{(\frac{Y_{j}}{{\hat{σ}}_{n}})}^{\frac{1}{m}}; t] - ν (X_{j}^{\frac{1}{m}}; t) = ({\hat{σ}}_{n} - σ) \frac{1}{mσ} X_{j}^{\frac{1}{m}} tϑ (X_{j}^{\frac{1}{m}}; t) + o_{P} (1)$ and $X_{j}^{- β (m - 1)} - X_{j}^{- {\hat{β}}_{n} (m - 1)} = ({\hat{β}}_{n} - β) (m - 1) β X_{j}^{- β (m - 1) - 1} + o_{P} (1) .$ Then, under $H_{0}$ , for all $t \in R$ , one has the equalities: $\begin{aligned} {\hat{W}}_{m, n} (t) & = ({\hat{β}}_{n} - β) (m - 1) β \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} X_{j}^{- β (m - 1) - 1} ν (X_{j}; t) \\ + ({\hat{σ}}_{n} - σ) \frac{1}{m^{2} σ} \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} X_{j}^{\frac{1}{m}} tϑ (X_{j}^{\frac{1}{m}}; t) + o_{P} (1) \\ = \sqrt{n} ({\hat{β}}_{n} - β) [\frac{(m - 1) β}{n} \sum_{j = 1}^{n} X_{j}^{- β (m - 1) - 1} ν (X_{j}; t)] \\ + \sqrt{n} ({\hat{σ}}_{n} - σ) [\frac{t}{m^{2} σ} \frac{1}{n} \sum_{j = 1}^{n} X_{j}^{\frac{1}{m}} ϑ (X_{j}^{\frac{1}{m}}; t)] + o_{P} (1) . \end{aligned}$ Multiply the terms in the brackets respectively by $β^{2}$ and $σ / β$ and define the results for all $t \in R$ by $ψ_{1, n} (t)$ and $ψ_{2, n} (t)$ .

By the law of large numbers, it is easy to see that $ψ_{1, n} (t)$ converges point-wise to $φ (t)$ and that it is equicontinuous on every compact subset of $R$ . Therefore, it converges uniformly to $φ (t)$ on any compact subset Θ of $R$ . By the same argument, $ψ_{2, n} (t)$ converges uniformly to $ψ (t)$ .

As a consequence of the above convergence, uniformly in $t \in Θ$ , $\begin{aligned} {\hat{W}}_{m, n} (t) & = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} {[\frac{1}{β} - \log (X_{j}) - \frac{1}{β} I (X_{j} \leq 1)] φ (t) + I (X_{j} \leq 1) ψ (t)} \\ + o_{P} (1) \end{aligned}$ and uniformly in $t \in Θ$ , $\begin{aligned} W_{m, n, {\hat{β}}_{n}, {\hat{σ}}_{n}}^{†} (t) = W_{m, n, β, σ}^{†} (t) + {\hat{W}}_{m, n} (t) \\ = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} {\frac{1}{m} ν (X_{j}^{\frac{1}{m}}; t) - X_{j}^{- β (m - 1)} ν (X_{j}; t)} \\ + \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} {[\frac{1}{β} - \log (X_{j}) - \frac{1}{β} I (X_{j} \leq 1)] φ (t) + I (X_{j} \leq 1) ψ (t)} \\ = {\tilde{W}}_{m, n, β, σ} + o_{P} (1) . \end{aligned}$ As this holds for arbitrary compact Θ, one can conclude that as n tends to infinity, in probability, $ρ ({\tilde{W}}_{m, n, β, σ}, W_{m, n, {\hat{β}}_{n}, {\hat{σ}}_{n}}^{†}) ⟶ 0,$ which establishes the proposition.

The corresponding result to Theorem 2.2 is the following:

Theorem 3.1

Let $m \in {2, \dots, n}$ , fixed. If $β > 1 / m$ , then under $H_{0}$ , ${\tilde{W}}_{m, n, β, σ} (\cdot)$ converges weakly in $C$ to a zero-mean Gaussian process $W_{m} (\cdot)$ with covariance kernel $Λ_{m}$ defined for any $s, t \in R$ by (20) $\begin{aligned} Λ_{m} (s, t) = β \int_{1}^{\infty} (\frac{1}{m^{2}} ν (x^{\frac{1}{m}}; t) ν (x^{\frac{1}{m}}; s) + x^{- 2 β (m - 1)} ν (x; t) ν (x; s) \\ + {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (t) + I (x \leq 1) ψ (t)} \\ \times {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (s) + I (x \leq 1) ψ (s)} \\ - \frac{1}{m} x^{- β (m - 1)} ν (t; s) ν (x^{\frac{1}{m}}; s) \\ + \frac{1}{m} {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (s) + I (x \leq 1) ψ (s)} ν (x^{\frac{1}{m}}; t) \\ - \frac{1}{m} x^{- β (m - 1)} ν (x; t) ν (x^{\frac{1}{m}}; s) \\ - x^{- β (m - 1)} ν (x; t) {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (s) + I (x \leq 1) ψ (s)} \\ + \frac{1}{m} ν (x^{\frac{1}{m}}; s) {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (t) + I (x \leq 1) ψ (t)} \\ - x^{- β (m - 1)} ν (x; s) {[\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (t) + I (x \leq 1) ψ (t)}) x^{- β - 1} d x . \end{aligned}$ (20)

Proof.

This result can be proved with the same techniques as in the proof of Theorem 2.2 by using, in the places of the functions $k (x, t)$ and $K (t)$ , the following functions $\begin{aligned} h (x, t) & = \frac{1}{m} ν (x^{\frac{1}{m}}; t) - x^{- β (m - 1)} ν (x; t) \\ + [\frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1)] φ (t) + I (x \leq 1) ψ (t) \end{aligned}$ and $H (t) = \int_{R} h (x, t) dF (x) .$ With these, under conditions (i), ${(i)}^{*}$ and ${(ii)}^{*}$ of Csörgö [Citation30] one has that $π_{n} (t) = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} [h (X_{j}, t) - H (t)]$ converges weakly to a zero-mean Gaussian process with covariance kernel $E [π_{n} (t) π_{n} (s)] = \int_{R} h (x, t) h (x, s) d F (x) - H (t) H (s) .$ We now check these conditions under $H_{0}$ . For this, we follow the same lines as in the proof of Theorem 2.2.

It is clear that $h (x, t)$ is bounded with respect to t on any compact subset of $R$ . So it is a trivial matter that in any compact set there is some $t_{0}$ so that for any $δ > 0$ $\int_{R} sup_{t \in Θ} | h (x, t) |^{2 + δ} d F (x) = \int_{R} | k (x, t_{0}) |^{2 + δ} d F (x) < \infty$ which shows the convergence of the finite-dimensional distributions of ${\tilde{W}}_{m, n, β, σ}$ , and checks the point (i)* of Csörgö [Citation30].

For checking (ii)*, one can write: $\begin{aligned} | h (x, t) - h (x, s) | & \leq \frac{1}{m} | ν (x^{\frac{1}{m}}; t) - ν (x^{\frac{1}{m}}; s) | + x^{- β (m - 1)} | ν (x; t) - ν (x; s) | \\ + | \frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1) | | φ (t) - φ (s) | + | ψ (t) - ψ (s) | . \end{aligned}$ By a first-order Taylor expansion of the functions $t \mapsto ν (x; t) = \cos (xt) + \sin (xt)$ and $t \mapsto ϑ (x; t) = \cos (xt) - \sin (xt)$ , one has: $\begin{aligned} | h (x, t) - h (x, s) | & \leq \frac{x^{\frac{1}{m}}}{m} | t - s | + x^{- β (m - 1) + 1} | t - s | \\ + Cst (| \frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1) | + 1) | t - s | . \end{aligned}$ Then, one easily sees that for any $s, t \in Θ$ , one can find $α \in (0, 1]$ such that $\begin{aligned} | h (x, t) - h (x, s) | & \leq | t - s |^{α} Cst (\frac{x^{\frac{1}{m}}}{m} + x^{- β (m - 1) + 1} + | \frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1) | + 1) \\ = | t - s |^{α} M (x, v (t, s)), \end{aligned}$ where v is any Θ-valued function defined on $Θ \times Θ$ , and for any $x \geq 1$ and $t \in Θ$ , M is the function defined as $M (x, t) = Cst (\frac{x^{\frac{1}{m}}}{m} + x^{- β (m - 1) + 1} + | \frac{1}{β} - \log (x) - \frac{1}{β} I (x \leq 1) | + 1) .$ As $M (x, t)$ does not depend on t, $sup_{t \in Θ} M^{2} (x; t) = M^{2} (x; t)$ and by our assumptions $\int_{R} sup_{t \in Θ} M^{2} (x; t) d F (x) = \int_{R} M^{2} (x; t) d F (x) < \infty .$ From these and by Csörgö [Citation30], it results that $π_{n} (t)$ converges weakly to any zero-mean Gaussian process with covariance kernel $\int_{R} h (x, t) h (x, s) d F (x) - H (s) H (t) .$

Given that under $H_{0}$ , $H (t) = 0$ , by the fact that (21) $π_{n} (t) = {\tilde{W}}_{m, n, β, σ} (t) - \sqrt{n} H (t),$ (21) one can conclude that ${\tilde{W}}_{m, n, β, σ} (\cdot)$ converges weakly to the Gaussian process invoked in the theorem. This establishes Theorem 3.1.

For the consistency of the test and the approximation of the quantiles of its null distribution in this case where σ is unknown, results similar to Theorems 2.3 and 2.4 can be stated and proved with the same techniques and arguments.

In what follows, we briefly comment on the application of our method to the Pareto Type II distribution. Recall that if a random variable Z follows a Pareto Type II distribution the CDF is given by (Equation2(2) $G_{β, σ, μ} (x) = 1 - {[1 + (\frac{x - μ}{σ})]}^{- β}, x \geq μ,$ (2) ). It readily follows that, given the random variable Z, the random variable $Y = Z - μ + σ$ follows a Type I Pareto distribution, $P (β, σ)$ . Given $Z_{1}, \dots, Z_{n}$ i.i.d. observations, testing for the null hypothesis of Pareto Type II distribution is tantamount to testing if $Z_{1} - μ + σ, \dots, Z_{n} - μ + σ$ follow a $P (β, σ)$ . Thus, it is enough to apply the above results derived for the the Pareto Type I distribution. In the case where the parameters are unknown, which is the one encountered in practice, one must replace the unknown parameter by consistent estimators, ${\hat{β}}_{n}$ , ${\hat{σ}}_{n}$ and ${\hat{μ}}_{n}$ . However, in this case, the asymptotics would be difficult to treat. One of the reasons being that the Bahadur representations of the estimators may not be easy to handle. We therefore do not pursue this matter further, as the focus of the paper is on the Pareto Type I distribution.

4. Monte Carlo simulation study and results

This section contains the results of a Monte Carlo study where the finite-sample performance of the newly proposed tests $T_{n, m, a}^{(1)}$ and $T_{n, m, a}^{(2)}$ are compared to the following existing tests for the hypothesis in (Equation1(1) $\begin{aligned} \begin{aligned} H_{0} : X follows a P (β, σ) distribution; \exists β, σ > 0 such that F (x) = F_{β, σ} (x), x \in [σ, \infty), \\ and \\ H_{1} : X does not follow a P (β, σ) distribution; ∄ β, σ > 0 such that F (x) = F_{β, σ} (x), \\ x \in [σ, \infty) . \end{aligned} \end{aligned}$ (1) ); i.e., the general Type I Pareto distribution:

The traditional Kolmogorov-Smirnov ( $K S_{n}$ ) and Cramer-von Mises ( $C V_{n}$ ) tests.
Two tests proposed by Zhang [Citation33] based on the likelihood ratio, with test statistics given by $Z A_{n} = - \sum_{j = 1}^{n} [\frac{\log {1 - X_{(j)}^{- {\hat{β}}_{n}}}}{n - j + \frac{1}{2}} + \frac{\log {X_{(j)}^{- {\hat{β}}_{n}}}}{j - \frac{1}{2}}]$ and $Z B_{n} = \sum_{j = 1}^{n} {[\log (\frac{{(1 - X_{(j)}^{- {\hat{β}}_{n}})}^{- 1} - 1}{\frac{(n - \frac{1}{2})}{(j - \frac{3}{4}) - 1}})]}^{2},$ where $X_{(1)} < X_{(2)} < \dots < X_{(n)}$ denote the order statistics of $X_{1}, X_{2}, \dots, X_{n}$ .
A test based on entropy utilizing the Kullback-Leibler divergence measure (see, e.g., [Citation34]). The test statistic is given by $K L_{n, m} = - H_{n, m} - \log ({\hat{β}}_{n}) + ({\hat{β}}_{n} + 1) \frac{1}{n} \sum_{j = 1}^{n} \log (X_{j}),$ where $H_{n, m} = \frac{1}{n} \sum_{j = 1}^{n} \log {(\frac{n}{2 m}) (X_{(j + m)} - X_{(j - m)})}$ is an estimator for the entropy, with $X_{(j)} = X_{(1)}$ for j<1, $X_{(j)} = X_{(n)}$ for j>n, and m is a window width subject to $m \leq \frac{n}{2}$ .
We implement the test for m = 1 and m = 10.
A test based on the empirical characteristic function proposed by Meintanis [Citation35]. The test is a weighted $L 2$ distance between the empirical characteristic function of transformed data and the characteristic function of the standard uniform distribution. Based on the transformation ${\hat{U}}_{j} = F_{{\hat{β}}_{n}} (X_{j}), j = 1, \dots, n$ , the test statistic is $\begin{aligned} M_{n, a} & = \frac{1}{n} \sum_{j, k = 1}^{n} \frac{2 a}{({\hat{U}}_{j} - {\hat{U}}_{k})^{2} + a^{2}} + 2 n [2 \tan^{- 1} (\frac{1}{a}) - alog (1 + \frac{1}{a^{2}})] \\ - 4 \sum_{j = 1}^{n} [\tan^{- 1} (\frac{{\hat{U}}_{j}}{a}) + \tan^{- 1} (\frac{1 - {\hat{U}}_{j}}{a})] . \end{aligned}$ The value of the tuning parameter is set to a = 0.5 and a = 1 in order to obtain the Monte Carlo results presented.
A test based on the Mellin transform proposed by Meintanis [Citation36]. The test statistic is given by $\begin{aligned} G_{n, a} & = \frac{1}{n} [({\hat{β}}_{n} + 1)^{2} \sum_{j, k = 1}^{n} I_{w}^{(0)} (X_{j} X_{k}) + \sum_{j, k = 1}^{n} I_{w}^{(2)} (X_{j} X_{k}) + 2 ({\hat{β}}_{n} + 1) \sum_{j, k = 1}^{n} I_{w}^{(1)} (X_{j} X_{k})] \\ + {\hat{β}}_{n} [n {\hat{β}}_{n} I_{w}^{(0)} (1) - 2 ({\hat{β}}_{n} + 1) \sum_{j = 1}^{n} I_{w}^{(0)} (X_{j}) - 2 \sum_{j = 1}^{n} I_{w}^{(1)} (X_{j})], \end{aligned}$ where $I_{w}^{(m)} (t) = \int_{0}^{\infty} (t - 1)^{m} \frac{1}{x^{t}} w (t) d t, m = 0, 1, 2.$ Choosing $w (x) = e^{- ax}$ , one has $\begin{aligned} I_{a}^{(0)} (x) & = (a + \log x)^{- 1}, \\ I_{a}^{(1)} (x) & = \frac{1 - a - \log x}{(a + \log x)^{2}}, \end{aligned}$ and $I_{a}^{(2)} (x) = \frac{2 - 2 a + a^{2} + 2 (a - 1) \log x + \log^{2} x}{(a + \log x)^{3}} .$ We present results for a = 0.5 and a = 2.
A test proposed by Allison et al. [Citation28]. The test statistic measures the difference between the empirical distribution of $min {X_{1}, \dots, X_{m}}$ and the V-empirical distribution of $\sqrt[m]{X}$ , defined as $Δ_{n, m} (x) = \frac{1}{n} \sum_{j = 1}^{n} I {X_{j}^{\frac{1}{m}} \leq x} - \frac{1}{n^{m}} \sum_{j_{1}, \dots, j_{m} = 1}^{n} I {min (X_{j_{1}}, \dots, X_{j_{m}}) \leq x} .$ Based on $Δ_{n, m}$ , the authors propose the following test statistic $A 1_{n, m} = \int_{1}^{\infty} Δ_{n, m} (x) d F_{n} (x) .$ We show results for m = 2.

4.1. Simulation settings

A Monte Carlo study is carried out to examine the empirical power performance of the tests discussed in the previous sections against various fixed alternative distributions; this includes those listed in Table along with two Pareto mixture distributions. Note that the alternatives listed in Table natively have support $(0, \infty)$ and so we were required to shift these distributions by $σ = 1$ unit to ensure that the simulated data has the same support as the Pareto distribution.

Table 1. Summary of various choices of the alternative distributions.

Display Table

The first of the two mixture distributions that we use in this study places mixture probability 1−p on the Pareto distribution with parameter $θ = 3$ ( $P (3)$ ) and probability p on the lognormal distribution with parameters $μ = - 2.69$ and $θ = 2$ ( $LN (- 2.69, 2)$ ). The second family of mixture distributions similarly places probability 1−p on the $P (3)$ distribution and probability p on the Weibull distribution with parameter $λ = 0.5$ and $θ = 0.25$ ( $W (0.5, 0.25)$ ). These parameter configurations are chosen to ensure that both distributions used in the mixtures share the same expected values. Random variates from many of these distributions can be obtained using, for example, the R package PoweR [Citation37].

The results are given in Tables and display the estimated powers (the percentage of times the null hypothesis is rejected in $MC = 20,000$ independent Monte Carlo replications) calculated for sample sizes n = 20 and n = 30. Note that the results from the test statistic $T_{n, m, a}^{(1)}$ are omitted from the simulation results as they were found to be similar to $T_{n, m, a}^{(2)}$ . The ‘warp-speed’ bootstrap method [Citation38] is employed in order to simultaneously calculate the bootstrap critical values and Monte Carlo approximations of the power. In addition, all results are calculated by estimating the parameters using either maximum likelihood estimation (MLE), ${\hat{σ}}_{MLE} = min (X_{1}, \dots, X_{n}) and {\hat{β}}_{MLE} = \frac{n}{\sum_{i = 1}^{n} \log (X_{i} / {\hat{σ}}_{MLE})}$ or method of moments estimation (MME), ${\hat{σ}}_{MME} = \frac{{\bar{X}}_{n} ({\hat{β}}_{MME} - 1)}{{\hat{β}}_{MME}} and {\hat{β}}_{MME} = \frac{n \bar{X} - min (X_{1}, \dots, X_{n})}{n ({\bar{X}}_{n} - min (X_{1}, \dots, X_{n}))} .$ The results obtained when MLE is employed are given in Tables and , whereas the results associated with MME are given in Tables and . A significance level of 5 % is used throughout the study and all calculations were executed using R v4.2.2 [Citation39]. Note that all the code used in these simulations can be found here:

The empirical power results for the two test statistics developed in this paper make use of specific configurations of the tuning parameters m and a, as these combinations were found to produce high powers in a separate exploratory preliminary simulation. Specifically, for $T^{(2)}$ , the parameter settings used are the pairings of m = 2 & a = 0.5, m = 3 & a = 0.5 and m = 4 & a = 0.5.

4.2. Simulation results

We begin by discussing the results obtained using MLE estimators for the parameters for n = 20 and n = 30, respectively, and then move on to the results obtained when using MME estimation. The two highest values for each alternative distribution (rows) are highlighted to make comparison easier. From Tables it is clear that all the tests closely maintain the nominal significance level of 5%.

When considering the estimated powers under MLE estimation, the test $K L_{n, 10}$ does best for the majority of the alternatives given in Table , closely followed by $T_{n, 2, 0.5}^{(2)}$ . However, we note that while the $K L_{n, 10}$ test has good performance among these non-mixture alternatives, it has remarkably poor performance when applied to the mixture distributions; displaying almost no power for this sequence of alternatives. When considering the two mixture alternatives, the statistic $T_{n, 4, 0.5}^{(2)}$ produces some of the highest estimated powers across all mixing proportions used. In particular, it has the best performance for the Pareto-log-normal mixture alternative and is tied for the best performing test statistic (with $Z A_{n}$ and $Z B_{n}$ ) for the Pareto-Weibull mixture.

Turning our attention to the performance of the tests under the MME estimation scheme, we find that the $G_{n, 2}$ test statistic generally produces the best power performance among the non-mixture alternatives, followed, once again, by $T_{n, 2, 0.5}^{(2)}$ . We note that, in stark contrast to the powers obtained under MLE estimation, the $K L_{n, 10}$ test is not competitive at all when using the MME estimation technique for the alternatives in Table . For the mixture alternatives, the test proposed in this paper produces some of the highest estimated powers for almost all of their parameter configurations. In particular, our test with setting m = 4 and a = 0.5 performs best for both of these mixture alternatives under almost all mixing proportions considered.

It is clear that the proposed test produces higher estimated powers for the alternatives in Table when using MME estimation compared to the corresponding powers obtained under MLE estimation. Conversely, these tests calculated under MLE have much higher powers than their MME counterparts when considering the mixture alternatives. As is common with these kinds of analyses, we conclude that there is no single test with the best overall power performance, but we find that our proposed test is competitive against the majority of other tests encountered in the literature when either MLE or MME estimation is used; this is true for both mixed and non-mixed alternatives.

Finally, it is clear from Tables that, for the test statistic $T_{n, m, a}^{(2)}$ , the choice of the parameters m and a potentially have a pronounced impact on the estimated powers produced. This fluctuation in power performance is briefly explored in Table for the ‘Benini’ and ‘log-Weibull’ alternatives using a variety of different configurations of the parameters m and a. These particular alternatives were selected because they can be considered local alternatives; setting the parameter θ to zero the Pareto distribution is obtained, with deviations from Pareto occurring when $θ > 0$ . The powers are obtained using a sample size of n = 20 and the parameters σ and β are estimated using MME. From Table it is clear that, in general, the highest powers are associated with smaller values of both m and a; the powers taper off as both m and a increase. This tendency is observed in all six alternatives considered here but was also found to be true for many of the other alternatives considered in the main simulation study. There were a few exceptions to this, but we only report these results as they are representative of the common trend observed.

Table 6. Estimated powers of the test statistic $T_{n, m, a}^{(2)}$ for varying choices of a and m (using MME estimation and sample size n = 20) for 6 alternatives.

Display Table

5. Practical data application

To further investigate the behaviour of the test statistics studied in this paper, we now present the results obtained by applying those tests to a practical data set. The data set concerns the earnings for the 2022 inaugural season of LIV golf. LIV golf is a new golf series backed by the Saudi Ariabian sovereign wealth fund which aims to be an alternative to the PGA Tour by attracting star players and providing larger paydays for winners. The data were obtained from www.spotrac.com [Citation40] and collects the player earnings in the 2022 season. The data are shown in Table and a box-plot of these values is provided in Figure .

Figure 1. Box-plot of the LIV golf 2022 earnings. The dashed line represents the threshold of $3,500,000.

Table 7. Data set: LIV golf earnings data set (accessed 2023-09-11).

Download CSV Display Table

It is clear from Figure that the data values have a right-skewed tendency, with some extremely large outliers. The Pareto distribution is a distribution with heavy tails and is often used to model income above a specified threshold, so it sensible to also consider this distribution as a possible model for these earnings values above a threshold. We therefore focus our attention on the LIV golf earnings above the threshold of 3,500,000, indicated in Figure with a dashed grey line. The values above the threshold are extracted from the data set, scaled by dividing them by the known value of 3,500,000, and their empirical distribution is plotted in Figure . Overlaid on this empirical distribution plot are two parametric Pareto distributions: one where the parameter is estimated using MLE (producing the estimate $\hat{β} = 1.781$ ) and the other where the parameter is estimated using MME (producing the estimate ${\hat{β}}_{n} = 1.932$ ). From these figures it seems that the Pareto might be a good fit for the data, but to more formally test this assertion we now apply all of the tests discussed in this paper to the above-threshold, scaled LIV golf earnings data, with the results found in Table . The estimated p-values reported in Table were obtained by making use of a parametric bootstrap employing $B = 10,000$ bootstrap replications from a Pareto distribution with parameter estimated using either the MLE or the MME. From these p-values it is clear that all the tests do not reject the null hypothesis that the 2022 season's earnings of LIV golfers exceeding 3,500,000, follows a Pareto distribution.

Figure 2. Empirical distribution function of the scaled above-threshold LIV golf 2022 earnings with two Pareto distribution overlays (one where the parameter is estimated via MLE and the other where it is estimated via MME).

Table 8. p-values for the LIV golf earnings data set (accessed 2023-09-11).

Display Table

6. Concluding remarks

In this paper, we propose two new classes of goodness-of-fit tests for the Pareto Type I distribution based on a characterization involving order statistics. In addition, we also derive the null distribution of these test statistics and show that the tests are indeed consistent. A Monte Carlo simulation study is presented to demonstrate the finite-sample performance of these tests under a variety of alternative distributions. Through the inclusion of other similar tests for the Pareto distribution, the simulation study also demonstrates that our tests are competitive when compared to these other tests. Finally, the choice of the two tuning parameters appearing in these tests was also roughly explored, with the finding that, when the tests are to be implemented in a practical setting, the choices a = 0.5 and m = 2 or m = 3 can be recommended.

Acknowledgments

The work of the third and fourth authors is based on research supported by the National Research Foundation (NRF). Any opinion, finding and conclusion or recommendation expressed in this material is that of the authors and the NRF does not accept any liability in this regard.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

Pareto V. The new theories of economics. J Polit Econ. 1897;4:485–502. doi: 10.1086/250454
Google Scholar
Beirlant J, de Wet T, Goegebeur Y. Pricing risk when distributions are fat tailed. J Appl Probab. 2004;41A:157–175.
Web of Science ®Google Scholar
Brazauskas V. Robust parametric modeling of the proportional reinsurance premium when claims are approximately Pareto-distributed. Proc Business Economic Stats Sect. 2000;A:144–149.
Google Scholar
Mert M, Saykan Y. On a bonus-malus system where the claim frequency distribution is geometric and the claim severity distribution is Pareto. Hacet J Math Stat. 2005;34:75–81.
Google Scholar
Zisheng O, Chi X. Generalized Pareto distribution fit to medical insurance claims data. Appl Math J Chinese Universities. 2006;21:21–29. doi: 10.1007/s11766-996-0018-z
Google Scholar
Keiding N, Hansen OKH, Sorensen DN, et al. The current duration approach to estimating time to pregnancy. Scand J Stat. 2012;39:185–204. doi: 10.1111/sjos.2012.39.issue-2
Web of Science ®Google Scholar
Keiding N, Kvist K, Hartvig H, et al. Estimating time to pregnancy from current durations in a cross-sectional sample. Bio-statistics. 2002;3:565–578.
Google Scholar
Arnold B. Pareto distributions. Boca Raton, FL: CRC Press, Taylor and Francis Group; 2015.
Google Scholar
Fisk P. The graduation of income distributions. Econometrica. 1961;29:171–185. doi: 10.2307/1909287
Web of Science ®Google Scholar
Steindl J. Random processes and the growth of firms. Madison, WI: Hafner Publishing; 1965.
Google Scholar
Berger J, Mandelbrot B. A new model for error clusterng in telephone circuits. IBM J Res Dev. 1963;7:224–236. doi: 10.1147/rd.73.0224
Web of Science ®Google Scholar
Harris CM. The Pareto distribution as a queue service discipline. Oper Res. 1968;16:307–313. doi: 10.1287/opre.16.2.307
Web of Science ®Google Scholar
Klass O, Biham O, Levy M, et al. The Forbes 400 and the Pareto wealth distribution. Econ Lett. 2006;90:290–295. doi: 10.1016/j.econlet.2005.08.020
Web of Science ®Google Scholar
Ioannides Y, Skouras S. US city size distribution: Robustly Pareto, but only in the tail. J Urban Econ. 2013;73:18–29. doi: 10.1016/j.jue.2012.06.005
Web of Science ®Google Scholar
Lomax K. Business failures: another example of the analysis of failure data. J Am Stat Assoc. 1954;49:847–852. doi: 10.1080/01621459.1954.10501239
Web of Science ®Google Scholar
Falk M, Guillou A, Toulemonde G. A LAN based Neyman smooth test for Pareto distributions. J Stat Plan Inference. 2008;138(10):2867–2886. doi: 10.1016/j.jspi.2007.10.007
Web of Science ®Google Scholar
Charpentier A, Flachaire E. Pareto models for risk management. Working Papers hal-02423805, HAL. 2019. Available from: https://ideas.repec.org/p/hal/wpaper/hal-02423805.html.
Google Scholar
Beirlant J, de Wet T, Goegebeur Y. A goodness-of-fit statistic for the Pareto-type behavior. J Comput Appl Math. 2006;186:99–116. doi: 10.1016/j.cam.2005.01.036
Web of Science ®Google Scholar
Gulati S, Shapiro S. Goodness-of-fit tests for Pareto distribution. In: Statistical models and methods for biomedical and technical systems. Boston, MA: Springer; 2008. p. 259–274.
Google Scholar
Martynov G. Cramér-von Mises test for the Weibull and Pareto distributions. In: Proceedings of Dobrushin International Conference Moscow, Moscow, Russia, 2009. p. 117–122.
Google Scholar
Rizzo ML. New goodness-of-fit tests for Pareto distributions. Astin Bull. 2009;39:69–715. doi: 10.2143/AST.39.2.2044654
Web of Science ®Google Scholar
Chu J, Dickin S, Nadarajah S. A review of goodness of fit tests for Pareto distributions. J Comput Appl Math. 2019;361:13–41. doi: 10.1016/j.cam.2019.04.018
Web of Science ®Google Scholar
Obradovíc M, Jovanovíc M, Miloševíc B. Goodness-of-fit tests for Pareto distribution based on a characterization and their asymptotics. Statistics. 2015;49:1026–1041. doi: 10.1080/02331888.2014.919297
Web of Science ®Google Scholar
Obradovíc M. On asymptotic efficiency of goodness of fit tests for Pareto distribution based on characterizations. Filomat. 2015;29:2311–2324. doi: 10.2298/FIL1510311O
Web of Science ®Google Scholar
Volkova K. Goodness-of-fit tests for Pareto distribution based on its characterization. Stat Methods Appl. 2016;25:351–373. doi: 10.1007/s10260-015-0330-y
Web of Science ®Google Scholar
Miloševíc B, Obradovíc M. Two-dimensional Kolmogorov-type goodness-of-fit tests based on characterizations and their asymptotic efficiencies. J Nonparametr Stat. 2016b;28:413–427. doi: 10.1080/10485252.2016.1163358
Web of Science ®Google Scholar
Ndwandwe L, Allison J, Santana L, et al. Testing for the Pareto type I distribution: A comparative study. 2022. arXiv preprint arXiv:2211.10088.
Google Scholar
Allison J, Milošević B, Obradović M, et al. Distribution-free goodness-of-fit tests for the Pareto distribution based on a characterization. Comput Stat. 2022;37:403–418. doi: 10.1007/s00180-021-01126-y
Web of Science ®Google Scholar
Bilodeau M, Lafaye de Micheaux P. A multivariate empirical characteristic function test of independence with normal marginals. J Multivar Anal. 2005;9(2):345–369. doi: 10.1016/j.jmva.2004.08.011
Google Scholar
Csörgö S. Kernel-transform empirical processes. J Multivar Anal. 1983;13:517–533. doi: 10.1016/0047-259X(83)90037-4
Web of Science ®Google Scholar
Ngatchou-Wandji J. Testing for symmetry in multivariate distributions. Stat Methodol. 2009;6(3):230–250. doi: 10.1016/j.stamet.2008.09.003
Google Scholar
Fan Y, Lafaye de Micheaux P, Penev S, et al. Multivariate nonparametric test of independence. J Multivar Anal. 2017;153:189–210. doi: 10.1016/j.jmva.2016.09.014
Web of Science ®Google Scholar
Zhang J. Powerful goodness-of-fit tests based on the likelihood ratio. J R Stat Soc B Stat Methodol. 2002;64(2):281–294. doi: 10.1111/1467-9868.00337
Google Scholar
Ahrari V, Baratpour S, Habibirad A, et al. Goodness of fit tests for Rayleigh distribution based on quantiles. Commun Stat. 2002;51(2):341–357. doi: 10.1080/03610918.2019.1651336
Google Scholar
Meintanis S. A unified approach of testing for discrete and continuous Pareto laws. Stat Papers. 2009;50(3):569–580. doi: 10.1007/s00362-007-0103-2
Web of Science ®Google Scholar
Meintanis S. Goodness-of-fit tests and minimum distance estimation via optimal transformation to uniformity. J Stat Plan Inference. 2009;139(2):100–108. doi: 10.1016/j.jspi.2008.03.037
Web of Science ®Google Scholar
Lafaye de Micheaux P, Tran VA. PoweR: A reproducible research tool to ease Monte Carlo power simulation studies for goodness-of-fit tests in R. J Stat Softw. 2016;69(3):1–42. doi: 10.18637/jss.v069.i03
Web of Science ®Google Scholar
Giacomini R, Politis DN, White H. A warp-speed method for conducting Monte Carlo experiments involving bootstrap estimators. Econ Theory. 2013;29(3):567–589. doi: 10.1017/S0266466612000655
Web of Science ®Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. 2021. Available from: https://www.R-project.org/.
Google Scholar
www.spotrac.com. LIV golf results by year. [accessed 2023 September 11]. Available from: https://www.spotrac.com/liv/rankings/year/2022/.
Google Scholar

On classes of consistent tests for the Type I Pareto distribution based on a characterization involving order statistics

Abstract

1. Introduction

2. The Type I Pareto distribution with unit scale parameter

2.1. The test statistic

2.2. Large sample properties

3. The case of the general Type I Pareto distribution

4. Monte Carlo simulation study and results

4.1. Simulation settings

Table 1. Summary of various choices of the alternative distributions.

Table 2. Estimated powers for sample size n = 20 based on MLE.

Table 3. Estimated powers for sample size n = 20 based on MME.

Table 4. Estimated powers for sample size n = 30 based on MLE.

Table 5. Estimated powers for sample size n = 30 based on MME.

4.2. Simulation results

Table 6. Estimated powers of the test statistic $T_{n, m, a}^{(2)}$ for varying choices of a and m (using MME estimation and sample size n = 20) for 6 alternatives.

5. Practical data application

Table 7. Data set: LIV golf earnings data set (accessed 2023-09-11).

Table 8. p-values for the LIV golf earnings data set (accessed 2023-09-11).

6. Concluding remarks

Acknowledgments

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

On classes of consistent tests for the Type I Pareto distribution based on a characterization involving order statistics

Abstract

1. Introduction

2. The Type I Pareto distribution with unit scale parameter

2.1. The test statistic

2.2. Large sample properties

3. The case of the general Type I Pareto distribution

4. Monte Carlo simulation study and results

4.1. Simulation settings

Table 1. Summary of various choices of the alternative distributions.

Table 2. Estimated powers for sample size n = 20 based on MLE.

Table 3. Estimated powers for sample size n = 20 based on MME.

Table 4. Estimated powers for sample size n = 30 based on MLE.

Table 5. Estimated powers for sample size n = 30 based on MME.

4.2. Simulation results

Table 6. Estimated powers of the test statistic Tn,m,a(2) for varying choices of a and m (using MME estimation and sample size n = 20) for 6 alternatives.

5. Practical data application

Table 7. Data set: LIV golf earnings data set (accessed 2023-09-11).

Table 8. p-values for the LIV golf earnings data set (accessed 2023-09-11).

6. Concluding remarks

Acknowledgments

Disclosure statement

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature

Table 6. Estimated powers of the test statistic $T_{n, m, a}^{(2)}$ for varying choices of a and m (using MME estimation and sample size n = 20) for 6 alternatives.