Full article: High order concentrated matrix-exponential distributions

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

This paper presents matrix-exponential (ME) distributions, whose squared coefficient of variation (SCV) is very low. Currently, there is no symbolic construction available to obtain the most concentrated ME distributions, and the numerical optimization-based approaches to construct them have many pitfalls. We present a numerical optimization-based procedure which avoids numerical issues.

Keywords:

1. Introduction

Highly concentrated matrix-exponential functions play an important role in many research fields, for example, they turned out to be essential for numerical inverse Laplace transform methods as well.^[Citation⁸^]

The least varying phase type (PH) distribution of order N is known to be the Erlang distribution^[Citation¹^] with SCV= $1 / N$ (defined as $\frac{μ_{0} μ_{2}}{μ_{1}^{2}} - 1,$ where $μ_{i}, i \in {0, 1, 2},$ are the moments of the distribution). Much less is known about the least varying ME distribution for order N. It is known that for order 2 the class of ME distributions is identical to the class of PH distributions, and it is also known that there exists an order 3 ME distribution with SCV = $0.200902 < 1 / 3,$ but it is still only a conjecture that this is the least varying order 3 ME distribution. Concentrated ME distributions are provided in Éltető et al.^[Citation⁴^] up to order 17 and in Horváth et al.^[Citation⁷^] up to order 47. These preliminary results indicate that, as N increases, the minimal SCV of order N ME distributions tends to be less than $2 / N^{2}$ for odd N, and for even N, the minimal SCV is close to the minimal SCV of order N – 1 ME distributions. In this paper, we propose numerical procedures by which much higher odd order concentrated ME distributions can be computed, and based on that, we refine the dependence of the minimal SCV on the order.

The rest of the paper is organized as follows. In Section 2, we provide the basic definition of the considered set of functions and the SCV. Section 3 presents an algorithm for efficient SCV computation of exponential cosine-square functions, while Section 4 introduces the numerical optimization procedure to obtain ME functions with low SCV. A heuristic approach with only 3 parameters is proposed in Section 5 and Section 6 concludes the paper.

2. Concentrated ME distributions

In this paper, we focus on real functions and a distribution on $R^{+} .$

Definition 2.1.

Order N ME functions (referred to as ME(N)) are given by (1) $f (t) = \underline{α} e^{A t} (- A) 1, t \geq 0,$ (1) where $\underline{α}$ is a real row vector of size N, A is a real matrix of size N × N and $1$ is the column vector of ones of size N, $\underline{α}$ is such that $\underline{α} 1 > 0$ and $f (t) \geq 0$ for $t \geq 0 .$

According to Equation(1)(1) $f (t) = \underline{α} e^{A t} (- A) 1, t \geq 0,$ (1) , vector $\underline{α}$ and matrix A define the ME function. We refer to the pair $(\underline{α}, A)$ as a matrix representation in the sequel. The matrix representation is not unique. Applying a similarity transformation with a nonsingular matrix T, for which $T 1 = 1$ holds, we obtain vector $\underline{β} = \underline{α} T^{- 1}$ and matrix $B = TA T^{- 1},$ that define the same ME function, since (2) $\begin{matrix} \underline{β} e^{B t} (- B) 1 = \underline{α} T^{- 1} e^{TA T^{- 1} t} (- TA T^{- 1}) 1 \\ = \underline{α} T^{- 1} T e^{A t} T^{- 1} T (- A) T^{- 1} 1 = \underline{α} e^{A t} (- A) 1 = f (t) . \end{matrix}$ (2)

Definition 2.2.

The SCV of the real function f(t) is defined as (3) $SCV (f (t)) = \frac{μ_{0} μ_{2}}{μ_{1}^{2}} - 1,$ (3) where $μ_{i} = \int_{t = 0}^{\infty} t^{i} f (t) d t$ for $i \in {0, 1, 2} .$

According to Definition 2.2, the SCV is insensitive to multiplication and scaling, because $\int_{t = 0}^{\infty} t^{i} c_{1} f (c_{2} t) d t = \frac{c_{1}}{c_{2}^{i + 1}} μ_{i}$ and (4) $SCV (c_{1} f (c_{2} t)) = \frac{\frac{c_{1}}{c_{2}} μ_{0} \frac{c_{1}}{c_{2}^{3}} μ_{2}}{{(\frac{c_{1}}{c_{2}^{2}} μ_{1})}^{2}} - 1 = \frac{μ_{0} μ_{2}}{μ_{1}^{2}} - 1 = SCV (f (t)) .$ (4)

Definition 2.3.

[Asmussen and Bladt^[Citation2] and Asmussen and O’Cinneide^[Citation³^]] The probability density function of an order N ME distribution is an ME(N) function with the following properties: $f (t) \geq 0$ for $t \geq 0$ and $\int_{t = 0}^{\infty} f (t) d t = 1 .$

If f(t) is a density function of an ME distribution, then μ_i denotes the ith moment and $SCV (f (t))$ the squared coefficient of variation (SCV) of the ME distribution. An ME distribution with density function f(t) is said to be concentrated if $SCV (f (t))$ is low.

Although ME functions have been used for many decades, there are still many questions open regarding their properties. Such an important question is how to decide efficiently if an ME function is non-negative for all t > 0. In general, $f (t) \geq 0$ does not hold for given $(\underline{α}, A)$ matrix representation for all t > 0, unless it has been constructed to be non-negative. In this paper, we are going to restrict our attention to such a special construction, the exponential-cosine square functions.

For the least varying ME(N) distributions only conjectures are available for $N \geq 3 .$ ^[Citation⁴^] According to the current conjecture, for odd N, the density function of the most concentrated ME(N) distribution belongs to a special class characterized by $f (t) = c_{1} f^{+} (c_{2} t),$ where c₁ and c₂ are positive constants and $f^{+} (t)$ is an exponential cosine-square function defined below.

Definition 2.4.

The set of exponential cosine-square functions of order n has the form (5) $f^{+} (t) = e^{- t} \prod_{i = 1}^{n} {cos}^{2} (\frac{ω t - ϕ_{i}}{2}), t \geq 0,$ (5) where $ω \geq 0$ and $0 \leq ϕ_{i} < π$ for $i \in {1, \dots, n} .$

Since the SCV is insensitive to multiplication and scaling, in the rest of the paper we focus on $SCV (f^{+} (t))$ with $f^{+} (t)$ as defined in Equation(5)(5) $f^{+} (t) = e^{- t} \prod_{i = 1}^{n} {cos}^{2} (\frac{ω t - ϕ_{i}}{2}), t \geq 0,$ (5) (that is, without normalization and scaling).

An exponential cosine-square function is defined by n + 1 parameters: ω and $ϕ_{i}$ for $i \in {1, \dots, n} .$ $f^{+} (t)$ is an ME( $2 n + 1$ ) function. Although the representation in Equation(5)(5) $f^{+} (t) = e^{- t} \prod_{i = 1}^{n} {cos}^{2} (\frac{ω t - ϕ_{i}}{2}), t \geq 0,$ (5) , which we refer to as the cosine-square representation, is not a matrix representation, [Horváth et al.,^[Citation⁷^] Appendix A] presents the associated matrix representation of size $N = 2 n + 1 .$ Consequently, the set of exponential cosine-square functions of order n define a special subset of ME( $2 n + 1$ ) since $f^{+} (t),$ by construction, is non-negative. The SCV of $f^{+} (t)$ is a complicated function of the parameters, whose minimum does not exhibit a closed analytic form. That is why we have resorted to the following numerical problem. For a given odd order $N = 2 n + 1,$ we are looking for efficient numerical methods for finding the ω and $ϕ_{i}$ ( $i \in {1, \dots, n}$ ) parameters which result in a low $SCV (f^{+} (t)) .$ For efficient numerical minimization of the SCV for N > 47 (i.e., n > 23) we need

an accurate computation of the SCV based on the parameters with low computational cost and
an efficient optimization procedure with low computational cost.

In this paper we present a method that addresses i) in Section 3, and one that addresses ii) in Section 4.

3. Efficient computation of the squared coefficient of variation

To evaluate the objective function of the optimization problem, namely the SCV, we need efficient methods to compute μ₀, μ₁ and μ₂. Deriving the μ_i parameters based on Equation(5)(5) $f^{+} (t) = e^{- t} \prod_{i = 1}^{n} {cos}^{2} (\frac{ω t - ϕ_{i}}{2}), t \geq 0,$ (5) is difficult for large N. Hence we propose to compute them based on a different representation.

3.1. The hyper-trigonometric representation

The following theorem defines the hyper-trigonometric form of the exponential cosine-square functions and provides a recursive procedure to obtain its parameters from ω and $ϕ_{i}, i \in {1, \dots, n} .$

Theorem 3.1.

An order $N = 2 n + 1$ exponential cosine-square function can be transformed to a hyper-trigonometric representation of form (6) $f^{+} (t) = c^{(n)} \cdot e^{- t} + e^{- t} \sum_{k = 1}^{n} a_{k}^{(n)} cos (k ω t) + e^{- t} \sum_{k = 1}^{n} b_{k}^{(n)} sin (k ω t),$ (6) where $t, ω \geq 0, c^{(n)} = \frac{1}{2} a_{0}^{(n)}$ and the coefficients $a_{k}^{(n)}, b_{k}^{(n)}$ are calculated recursively:

for n = 1: (7) $a_{0}^{(1)} = 1, b_{0}^{(1)} = 0, a_{1}^{(1)} = \frac{1}{2} cos ϕ_{1}, b_{1}^{(1)} = \frac{1}{2} sin ϕ_{1},$ (7)
for $k = 0, n \geq 1$ : (8) $a_{0}^{(n)} = \frac{1}{2} a_{0}^{(n - 1)} + \frac{1}{2} a_{1}^{(n - 1)} cos ϕ_{n} + \frac{1}{2} b_{1}^{(n - 1)} sin ϕ_{n},$ (8) (9) $b_{0}^{(n)} = 0,$ (9)
for $1 \leq k \leq n, n \geq 2$ (10) $a_{k}^{(n)} = \frac{1}{2} a_{k}^{(n - 1)} + \frac{1}{2} \frac{a_{k - 1}^{(n - 1)} + a_{k + 1}^{(n - 1)}}{2} cos ϕ_{n} + \frac{1}{2} \frac{b_{k + 1}^{(n - 1)} - b_{k - 1}^{(n - 1)}}{2} sin ϕ_{n},$ (10) (11) $b_{k}^{(n)} = \frac{1}{2} b_{k}^{(n - 1)} + \frac{1}{2} \frac{b_{k - 1}^{(n - 1)} + b_{k + 1}^{(n - 1)}}{2} cos ϕ_{n} + \frac{1}{2} \frac{a_{k - 1}^{(n - 1)} - a_{k + 1}^{(n - 1)}}{2} sin ϕ_{n}$ (11)

where

a_{k}^{(n)} = b_{k}^{(n)} = 0

for k > n.

Proof.

The theorem is proved by induction. For the order $N = 2 n + 1$ cosine-square function we introduce the notation $f^{(n)} (t) = \prod_{i = 1}^{n} {cos}^{2} (\frac{ω t - ϕ_{i}}{2}) .$ For n = 1, $\begin{matrix} f^{(1)} (t) = {cos}^{2} (\frac{ω t - ϕ_{1}}{2}) = \frac{1}{2} (1 + cos (ω t - ϕ_{1})) \\ = \frac{1}{2} (1 + cos (ω t) cos (ϕ_{1}) + sin (ω t) sin (ϕ_{1})), \end{matrix}$ and the theorem holds. Assuming that the theorem holds for n – 1, we are going to show that it holds for n as well. We have that $\begin{matrix} f^{(n)} (t) & = f^{(n - 1)} (t) {cos}^{2} (\frac{ω t - ϕ_{n}}{2}) = f^{(n - 1)} (t) (\frac{1}{2} + \frac{1}{2} cos (ω t - ϕ_{n})) \\ = f^{(n - 1)} (t) / 2 + f^{(n - 1)} (t) cos (ω t) cos (ϕ_{n}) / 2 + f^{(n - 1)} (t) sin (ω t) sin (ϕ_{n}) / 2 \\ = c^{(n - 1)} / 2 + \sum_{k = 1}^{n - 1} a_{k}^{(n - 1)} cos (k ω t) / 2 + \sum_{k = 1}^{n - 1} b_{k}^{(n - 1)} sin (k ω t) / 2 \\ + c^{(n - 1)} cos (ω t) cos (ϕ_{n}) / 2 + c^{(n - 1)} sin (ω t) sin (ϕ_{n}) / 2 \\ + \sum_{k = 1}^{n - 1} a_{k}^{(n - 1)} cos (k ω t) cos (ω t) cos (ϕ_{n}) / 2 \\ + \sum_{k = 1}^{n - 1} a_{k}^{(n - 1)} cos (k ω t) sin (ω t) sin (ϕ_{n}) / 2 \\ + \sum_{k = 1}^{n - 1} b_{k}^{(n - 1)} sin (k ω t) cos (ω t) cos (ϕ_{n}) / 2 \\ + \sum_{k = 1}^{n - 1} b_{k}^{(n - 1)} sin (k ω t) sin (ω t) sin (ϕ_{n}) / 2. \end{matrix}$

Using the identities $\begin{matrix} cos (k ω t) cos (ω t) = cos ((k + 1) ω t) / 2 + cos ((k - 1) ω t) / 2, \\ sin (k ω t) sin (ω t) = - cos ((k + 1) ω t) / 2 + cos ((k - 1) ω t) / 2, \\ cos (k ω t) sin (ω t) = sin ((k + 1) ω t) / 2 - sin ((k - 1) ω t) / 2, \\ sin (k ω t) cos (ω t) = sin ((k + 1) ω t) / 2 + sin ((k - 1) ω t) / 2, \end{matrix}$ and collecting the coefficients corresponding to $cos (k ω t)$ and $sin (k ω t)$ provides Equation(10)(10) $a_{k}^{(n)} = \frac{1}{2} a_{k}^{(n - 1)} + \frac{1}{2} \frac{a_{k - 1}^{(n - 1)} + a_{k + 1}^{(n - 1)}}{2} cos ϕ_{n} + \frac{1}{2} \frac{b_{k + 1}^{(n - 1)} - b_{k - 1}^{(n - 1)}}{2} sin ϕ_{n},$ (10) and Equation(11)(11) $b_{k}^{(n)} = \frac{1}{2} b_{k}^{(n - 1)} + \frac{1}{2} \frac{b_{k - 1}^{(n - 1)} + b_{k + 1}^{(n - 1)}}{2} cos ϕ_{n} + \frac{1}{2} \frac{a_{k - 1}^{(n - 1)} - a_{k + 1}^{(n - 1)}}{2} sin ϕ_{n}$ (11) . Terms from $cos ((k - 1) ω t)$ at k = 1 contribute to $c^{(n)},$ leading to $c^{(n)} = \frac{1}{2} a_{0}^{(n)} .$ The relations for the boundaries can be obtained in a similar manner. □

The hyper-trigonometric representation makes it possible to express the Laplace transform (LT) and the μ_i moments for $i \in {0, 1, 2}$ in a simple and compact way.

Corollary 3.2.

The LT and the $μ_{i}, i \in {0, 1, 2},$ moments of the exponential cosine-square function, for $s \in C$ , are given by (12) $f^{*} (s) = \int_{0}^{\infty} e^{- s t} f^{+} (t) d t = \frac{c^{(n)}}{1 + s} + \sum_{k = 1}^{n} \frac{a_{k}^{(n)} (1 + s) + b_{k}^{(n)} k ω}{{(1 + s)}^{2} + {(k ω)}^{2}},$ (12) and (13) $\begin{matrix} μ_{0} = c^{(n)} + \sum_{k = 1}^{n} \frac{a_{k}^{(n)} + b_{k}^{(n)} k ω}{1 + {(k ω)}^{2}}, \\ μ_{1} = c^{(n)} + \sum_{k = 1}^{n} \frac{a_{k}^{(n)} + 2 b_{k}^{(n)} k ω - a_{k}^{(n)} {(k ω)}^{2}}{{(1 + {(k ω)}^{2})}^{2}}, \\ μ_{2} = 2 c^{(n)} + \sum_{k = 1}^{n} \frac{2 a_{k}^{(n)} + 6 b_{k}^{(n)} k ω - 6 a_{k}^{(n)} {(k ω)}^{2} - 2 b_{k}^{(n)} {(k ω)}^{3}}{{(1 + {(k ω)}^{2})}^{3}} . \end{matrix}$ (13)

Proof.

Since Equation(6)(6) $f^{+} (t) = c^{(n)} \cdot e^{- t} + e^{- t} \sum_{k = 1}^{n} a_{k}^{(n)} cos (k ω t) + e^{- t} \sum_{k = 1}^{n} b_{k}^{(n)} sin (k ω t),$ (6) is linear, it can be Laplace transformed term-by-term using the relations $\begin{matrix} L T (e^{- t}) = \frac{1}{s + 1}, \\ L T (e^{- t} cos (ω t)) = \frac{s + 1}{{(s + 1)}^{2} + ω^{2}}, \\ L T (e^{- t} sin (ω t)) = \frac{ω}{{(s + 1)}^{2} + ω^{2}} . \end{matrix}$

Based on $f^{*} (s),$ the μ_i moments can be computed using the LT moment relation $μ_{i} = {(- 1)}^{i} \frac{d^{i}}{d s^{i}} f^{*} (s) |_{s = 0} .$

In order to compute the matrix representation of the exponential cosine-square function, we introduce one more representation, and the associated transformation.

3.2. The spectral representation

The hyper-trigonometric representation makes it easy to obtain the spectral form of $f^{+} (t)$ as (14) $\begin{matrix} f^{+} (t) = & c^{(n)} e^{- t} + \frac{1}{2} \sum_{k = 1}^{n} (a_{k}^{(n)} + i b_{k}^{(n)}) e^{- (1 + i k ω) t} + (a_{k}^{(n)} - i b_{k}^{(n)}) e^{- (1 - i k ω) t}, \end{matrix}$ (14) where $i$ denotes the imaginary unit. We refer to Equation(14)(14) $\begin{matrix} f^{+} (t) = & c^{(n)} e^{- t} + \frac{1}{2} \sum_{k = 1}^{n} (a_{k}^{(n)} + i b_{k}^{(n)}) e^{- (1 + i k ω) t} + (a_{k}^{(n)} - i b_{k}^{(n)}) e^{- (1 - i k ω) t}, \end{matrix}$ (14) as the spectral representation because these exponential coefficients are the eigenvalues of matrix A in the matrix representation.

3.3. The matrix representation

Corollary 3.3.

The $f^{+} (t) = e^{- t} \prod_{i = 1}^{n} {cos}^{2} (\frac{ω t - ϕ_{i}}{2})$ function has a size $N = 2 n + 1$ matrix representation $f^{+} (t) = \underline{α} e^{A t} (- A) 1$ , where the row vector $\underline{α}$ is composed by the elements (15) $α_{1} = c^{(n)},$ (15) (16) $α_{2 k} = \frac{1}{2} \frac{a_{k}^{(n)} (1 + k ω) - b_{k}^{(n)} (1 - k ω)}{1 + {(k ω)}^{2}},$ (16) (17) $α_{2 k + 1} = \frac{1}{2} \frac{a_{k}^{(n)} (1 - k ω) + b_{k}^{(n)} (1 + k ω)}{1 + {(k ω)}^{2}},$ (17) and the matrix A is given by (18) $A = [\begin{matrix} - 1 \\ Δ_{1} \\ ⋱ \\ Δ_{n} \end{matrix}],$ (18) where $Δ_{k} = [\begin{matrix} - 1 & - k ω \\ k ω & - 1 \end{matrix}]$ , for $k \in {1, \dots, n} .$

Proof.

According to Equation(18)(18) $A = [\begin{matrix} - 1 \\ Δ_{1} \\ ⋱ \\ Δ_{n} \end{matrix}],$ (18) , (19) $e^{A t} = [\begin{matrix} e^{- t} \\ e^{Δ_{1} t} \\ ⋱ \\ e^{Δ_{n} t} \end{matrix}]$ (19) where (20) $e^{Δ_{k} t} = e^{[\begin{matrix} - 1 & - k ω \\ k ω & - 1 \end{matrix}] t} = [\begin{matrix} e^{- t} cos (k ω t) & - e^{- t} sin (k ω t) \\ e^{- t} sin (k ω t) & e^{- t} cos (k ω t) \end{matrix}] .$ (20)

Based on Equation(19)(19) $e^{A t} = [\begin{matrix} e^{- t} \\ e^{Δ_{1} t} \\ ⋱ \\ e^{Δ_{n} t} \end{matrix}]$ (19) and Equation(20)(20) $e^{Δ_{k} t} = e^{[\begin{matrix} - 1 & - k ω \\ k ω & - 1 \end{matrix}] t} = [\begin{matrix} e^{- t} cos (k ω t) & - e^{- t} sin (k ω t) \\ e^{- t} sin (k ω t) & e^{- t} cos (k ω t) \end{matrix}] .$ (20) , we can write (21) $\begin{matrix} \underline{α} e^{A t} (- A) 1 = c^{(n)} e^{- t} + \sum_{k = 1}^{n} [\begin{matrix} α_{2 k} & α_{2 k + 1} \end{matrix}] e^{Δ_{k} t} [\begin{matrix} 1 + k ω \\ 1 - k ω \end{matrix}] \\ = c^{(n)} e^{- t} + \sum_{k = 1}^{n} [\begin{matrix} α_{2 k} & α_{2 k + 1} \end{matrix}] [\begin{matrix} e^{- t} cos (k ω t) & - e^{- t} sin (k ω t) \\ e^{- t} sin (k ω t) & e^{- t} cos (k ω t) \end{matrix}] [\begin{matrix} 1 + k ω \\ 1 - k ω \end{matrix}] \\ = c^{(n)} e^{- t} + \sum_{k = 1}^{n} e^{- t} (a_{k}^{(n)} cos (k ω t) + b_{k}^{(n)} sin (k ω t)), \end{matrix}$ (21)

which is identical with Equation(6)(6) $f^{+} (t) = c^{(n)} \cdot e^{- t} + e^{- t} \sum_{k = 1}^{n} a_{k}^{(n)} cos (k ω t) + e^{- t} \sum_{k = 1}^{n} b_{k}^{(n)} sin (k ω t),$ (6) . □

3.4. Numerical computation of the moments

Theorem 3.1 together with Corollary 3.2 provides a very efficient explicit method to compute the SCV based on the parameters $ω, ϕ_{i}, i \in {1, \dots, n} .$

There is one numerical issue that has to be taken care of when applying this numerical procedure with floating point arithmetic for large values of n. To evaluate the SCV, coefficients $a_{k}^{(n)}, b_{k}^{(n)}, c^{(n)}$ need to be obtained from the ω and $ϕ_{i}, i \in {1, \dots, n}$ parameters. The recursion defined in Theorem 3.1 involves multiplications between bounded numbers (sine and cosine always fall into $[- 1, + 1]$ ), which is beneficial from the numerical stability point of view, but subtractions are unfortunately also present, leading to a loss of precision. To overcome this loss of precision, we introduce increased precision floating point arithmetic both in our Mathematica and C++ implementations¹.^[Citation⁹^,Citation¹⁰^] Mathematica can quantify the precision loss, enabling us to investigate this issue experimentally. According to , the number of accurate decimal digits lost when evaluating the SCV from the $ω, ϕ_{i}$ parameters (computed by the Precision function of Mathematica), denoted by L_n, is nearly linear and can be approximated by (22) $L_{n} \approx 1.487 + 0.647 n .$ (22)

Figure 1. The precision loss while computing the SCV.

In the forthcoming numerical experiments we have set the floating point precision to $L_{n} + 16$ decimal digits to obtain an accuracy of results up to 16 decimal digits, and this precision setting eliminated all numerical issues.

It is important to note that the high precision is needed only to calculate the $a_{k}^{(n)}, b_{k}^{(n)}, c^{(n)}$ coefficients and the SCV itself. Representing parameters $ω, ϕ_{i}, i \in {1, \dots, n}$ themselves does not need extra precision, and the resulting exponential cosine-square function $f^{+} (t)$ can be evaluated with machine precision as well (in the range of our interest, $n \leq 1000$ ).

A basic pseudo-code of the computation of the SCV with the indications where high precision is needed is provided by Algorithm 1.

Algorithm 1.

Pseudo-code for the computation of the SCV

procedure ComputeSCV( $n, ω, ϕ_{i}, i \in {1, \dots, n}$ )
Compute the required precision, L_n, from Equation(22)(22) $L_{n} \approx 1.487 + 0.647 n .$ (22)
Convert $ω, ϕ_{i}, i \in {1, \dots, n}$ to $L_{n} + 16$ digits precision
Calculate $a_{k}^{(n)}, b_{k}^{(n)}, k \in {1, \dots, n}, c^{(n)}$ by Theorem 3.1 (high precision)
Calculate moments $μ_{0}, μ_{1}, μ_{2}$ according to Equation(13)(13) $\begin{matrix} μ_{0} = c^{(n)} + \sum_{k = 1}^{n} \frac{a_{k}^{(n)} + b_{k}^{(n)} k ω}{1 + {(k ω)}^{2}}, \\ μ_{1} = c^{(n)} + \sum_{k = 1}^{n} \frac{a_{k}^{(n)} + 2 b_{k}^{(n)} k ω - a_{k}^{(n)} {(k ω)}^{2}}{{(1 + {(k ω)}^{2})}^{2}}, \\ μ_{2} = 2 c^{(n)} + \sum_{k = 1}^{n} \frac{2 a_{k}^{(n)} + 6 b_{k}^{(n)} k ω - 6 a_{k}^{(n)} {(k ω)}^{2} - 2 b_{k}^{(n)} {(k ω)}^{3}}{{(1 + {(k ω)}^{2})}^{3}} . \end{matrix}$ (13) (machine precision)
Calculate $SCV = \frac{μ_{0} μ_{2}}{μ_{1}^{2}} - 1$ (machine precision)
return SCV
end procedure

4. Minimizing the squared coefficient of variation

Given the size of the representation $N = 2 n + 1,$ the $f^{+} (t)$ function providing the minimal SCV is obtained by minimizing Equation(3)(3) $SCV (f (t)) = \frac{μ_{0} μ_{2}}{μ_{1}^{2}} - 1,$ (3) subject to ω and $ϕ_{i}, i \in {1, \dots, n} .$ The form of the SCV does not allow a symbolic solution, and its numerical optimization is challenging, too. The surface to optimize has many local optima, hence simple gradient descent procedures failed to find the global optimum and are sensitive to the initial guess.

4.1. Optimizing the parameters

In the numerical optimization of the parameters, we had success with evolutionary optimization methods, in particular with evolution strategies. The results introduced in Horváth et al.^[Citation⁷^] were obtained by one of the simplest evolution strategies, the Rechenberg method.^[Citation¹¹^]. In Horváth et al.,^[Citation⁷^] it was the high computational demand of the numerical integration needed to obtain the SCV and its reduced accuracy that prevented the optimization for N > 47 (n > 23).

However, computing the SCV based on the hyper-trigonometric representation using the results of Section 3.1 allows us to evaluate the moments orders of magnitudes faster and more accurately, enabling the optimization for higher n values. With the Rechenberg method (Rechenberg^[Citation¹¹^], also referred to as (1 + 1)-ES in the literature) it is possible to obtain low SCV values relatively quickly for orders as high as n = 125, but these values are suboptimal in the majority of cases.

With more advanced evolution strategies the optimal SCV can be approached better. Our implementation^[Citation⁹^] supports the covariance matrix adoption evolution strategy (CMA-ES^[Citation⁵^]), and one of its variants, the BIPOP-CMA-ES with restarts.^[Citation⁶^]. Starting from a random initial guess, we got very low SCV values much quicker with the CMA-ES than with the (1 + 1)-ES with similar suboptimal minimum values (cf. ). The limit of applicability of CMA-ES is about n = 180. The best solution (lowest SCV for the given order), however, was always provided by the BIPOP-CMA-ES method, although it was by far the slowest among the three methods we studied. In fact, we believe that BIPOP-CMA-ES returned the global optimum for $n \in {1, \dots, 74},$ and we investigate the properties of those solutions in the next sections. For n > 74, we can still compute low SCV functions with the BIPOP-CMA-ES method, but its computation time gets to be prohibitive, and we are less confident about the global minimality of the results.

Figure 2. The minimal and the heuristic SCV as a function of order n in log-log scale.

For our particular problem, the running time, T, and the quality of the minimum, Q (how low the SCV is), obtained by the different optimization methods can be summarized as follows $\begin{array}{l} T_{CMA - ES} < T_{(1 + 1) - ES} ≪ T_{BIPOP - CMA - ES}, \\ Q_{CMA - ES} \sim Q_{(1 + 1) - ES} < Q_{BIPOP - CMA - ES} . \end{array}$

4.2. Properties of the minimal SCV solutions

The minimal SCV values obtained by the BIPOP-CMA-ES optimization, which we conjecture to be optimal for $n \in {1, \dots, 74},$ are depicted in . Apart from the minimal SCV values of the exponential cosine-square functions, also plots $1 / (2 n + 1)$ and $2 / {(2 n + 1)}^{2},$ for comparison. The $SCV = 1 / N = 1 / (2 n + 1)$ is known to be the minimal SCV value for PH distributions of order N,^[Citation¹^] which form a subset in the set of order N ME distributions.^[Citation²^] The $2 / N^{2} = 2 / {(2 n + 1)}^{2}$ curve is reported to be the approximate decay rate in Horváth^[Citation⁷^], up to n = 23 (N = 47).

Figure 3. The minimal SCV of the exponential cosine-square functions as the function of n in log-log scale.

indicates that the SCV decreases much faster than $1 / N$ and a bit faster than $2 / N^{2} .$ Indeed, $2 / N^{2}$ is a good approximation up to n = 23, but the decay seems to decrease below $2 / N^{2}$ for n > 23.

We suspect that the decrease is asymptotically polynomial (at $n \to \infty$ ), that we checked by plotting the $SCV n^{2.14}$ function in . While the exponent is determined empirically and might be slightly off, it is visible that the convergence is faster than $1 / n^{2}$ for the examined range (n < 1000).

Figure 4. The function $S (n) = S C V_{n} \cdot n^{2.14}$ with logarithmic y axis.

Figure 4. The function S(n)=SCVn·n2.14 with logarithmic y axis.

4.3. The parameters providing the minimal SCV

In this section, we investigate the parameters corresponding to the minimal SCV and provide an intuitive explanation for the observed behavior. depicts the optimal ω parameter as a function of n. It shows a slow decrease with some inhomogeneity around $n \in {14, 28, 43, 60} .$ suggests that ω tends to 0 as $n \to 0,$ and the inhomogeneity is related to the behavior of the $ϕ_{k}$ parameters, as it is detailed below.

Figure 5. The ω parameter providing the minimal SCV in lin-lin and log-log scales.

The visual appearance of the optimal $ϕ_{k}$ parameters in reveals many more interesting properties. First, since the period of the cosine-square function is π, the $ϕ_{k}$ parameters in Equation(5)(5) $f^{+} (t) = e^{- t} \prod_{i = 1}^{n} {cos}^{2} (\frac{ω t - ϕ_{i}}{2}), t \geq 0,$ (5) are $2 π$ periodic. That is, adding an integer times $2 π$ to any of the $ϕ_{k}$ parameters does not change the $f^{+} (t)$ function. In we transformed all $ϕ_{k}$ parameters to the $(- π, π)$ range and depicted $ϕ_{k} + π$ instead of $ϕ_{k}$ because $ϕ_{k} + π$ indicates the location where the cosine-square term with $ϕ_{k}$ is zero in the $(0, 2 π)$ cycle, which is the $(0, 2 π / ω)$ interval of $f^{+} (t) .$ The nth row in depicts n points, which are the optimal $ϕ_{k} + π$ values for $k = 1, \dots, n .$ In these n points of the $(0, 2 π)$ cycle $f^{+} (t)$ equals zero. In between these zeros $f^{+} (t)$ has humps.

Figure 6. The location of the $ϕ_{k}, k = 1, \dots, n$ parameters providing the minimal SCV.

Figure 6. The location of the ϕk,k=1,…,n parameters providing the minimal SCV.

and demonstrate the effect of $ϕ_{k}$ on $f^{+} (t)$ for ω = 1, n = 3, $ϕ_{i} + π \in {0.1, 1, 2}$ for $i \in {1, 2, 3} .$ In the $(0, 2 π)$ interval, both, $f^{+} (t)$ and $\prod_{i = 1}^{n} {cos}^{2} (\frac{t - ϕ_{i}}{2}),$ have zeros at 0.1, 1 and 2. depicts $f^{+} (t)$ with the exponential attenuation, while depicts $\prod_{i = 1}^{n} {cos}^{2} (\frac{t - ϕ_{i}}{2})$ without the exponential attenuation. In the sizes of the humps depend on the distance of the neighboring zeros. The hump between 0.1 and 1 is smaller than the one between 1 and 2, which indicates that the closer the neighboring zeros are the smaller the humps are. In , the exponential attenuation also affects the sizes of the humps. With the exponential attenuation the hump between 0.1 and 1 is larger than the one between 1 and 2. Additionally, the SCV is more sensitive to the humps further from the main peak, which motivates the fact that the $ϕ_{k} + π$ parameters are more concentrated around 0 to make the function as flat as possible near t = 0, where the exponential attenuation is rather weak. In , right of the main peak, $f^{+} (t)$ is suppressed by the exponential attenuation, while left of the peak the zeros of the cosine square terms keep the function low (c.f. ).

Figure 7. $f^{+} (t)$ with ω = 1, n = 3, $ϕ_{i} + π \in {0.1, 1, 2}$ for $i \in {1, 2, 3} .$

Figure 7. f+(t) with ω = 1, n = 3, ϕi+π∈{0.1,1,2} for i∈{1,2,3}.

Figure 8. The initial part of $\prod_{i = 1}^{n} {cos}^{2} (\frac{ω t - ϕ_{i}}{2})$ (without exponential attenuation), with ω = 1, n = 3, $ϕ_{i} + π \in {0.1, 1, 2}$ for $i \in {1, 2, 3} .$

Figure 8. The initial part of ∏i=1n cos 2(ωt−ϕi2) (without exponential attenuation), with ω = 1, n = 3, ϕi+π∈{0.1,1,2} for i∈{1,2,3}.

In , for n < 14, the zeros are located between 0 and 5, which means that in this range of $n f^{+} (t)$ is kept close to zero by the cosine-square functions in the interval (0, 5). It has a peak between 5 and $2 π,$ and the next cycles are suppressed by the exponential attenuation.

For $n \geq 14,$ there is a gap in the sequence of $ϕ_{k}$ parameters, indicating the location of a spike of $f^{+} (t) .$ It means that the exponential attenuation is not strong enough for suppressing $f^{+} (t)$ right after the spike, and the minimal SCV is obtained when some cosine-square terms are used to enforce the suppression beyond the spike.

The number of cosine-square terms used to suppress $f^{+} (t)$ beyond the spike changes at $n \in {14, 28, 43, 60} .$ These changes result in the small inhomogeneity in the ω values at $n \in {14, 28, 43, 60}$ in .

According to our intuitive understanding ω tends to 0 as $n \to \infty,$ because the cosine squared terms are more efficient in suppressing $f^{+} (t)$ than the exponential attenuation, and for large n, the cosine squared terms create a sharp spike inside the $(0, 2 π)$ cycle (which is the $(0, 2 π / ω)$ interval), such that the zeros are located on both sides of the spike. The cosine squared terms are $2 π / ω$ periodic and consequently, there is a spike also in the $(2 π / ω, 4 π / ω)$ interval which is suppressed by exponential attenuation. The exponential attenuation in a $2 π / ω$ long interval is $e^{2 π / ω} .$ In order to efficiently suppress the spike in the $(2 π / ω, 4 π / ω)$ interval, ω has to be small.

5. Heuristic optimization with 3 parameters

According to the previously discussed approach the number of parameters to optimize increases with n. This drawback limits the applicability of the general optimization procedures to about $n \leq 74$ in the case of BIPOP-CMA-ES and about $n \leq 180$ in the case of the basic CMA-ES. Using these n values the optimization procedure takes several days to terminate on an average PC clocked at 3.4 GHz.

While the $f^{+} (t)$ function obtained this way for n = 180 has an extremely low SCV ( $\approx 10^{- 5}$ ) already, some applications might benefit from ME distributions with even lower SCV. To overcome this limitation we developed a suboptimal heuristic procedure, that aims to obtain low SCV for a given large order n.

Our heuristic procedure has to optimize only three parameters, independent of the order n. The procedure is based on the assumption that the location of the spike in the $(0, 2 π)$ cycle of the cosine-squared function plays the most important role in the SCV, and the exact values of the $ϕ_{k}$ parameters are less important, the only important feature is that the cosine-squared terms characterized by the $ϕ_{k}$ parameters should suppress $f^{+} (t)$ uniformly in the $(0, 2 π)$ cycle – apart from the spike (cf. ). In case of ME distributions, the spike is the dominant mode of the density function.

Figure 9. The spike and the zeros of $f^{+} (t) .$

Based on this assumption we set the $ϕ_{k}$ parameters of the cosine-squared terms equidistantly. This way the position of the spike (denoted by p) and its width (w) inside the $(0, 2 π)$ interval completely define the $ϕ_{k}$ values for a given order n.

The distance of the $ϕ_{k}$ parameters (d) and the number of $ϕ_{k}$ parameters before the spike (i) can be computed from p and w by (23) $d = \frac{2 π - w}{n}, i = ⌊ \frac{p - w / 2}{d} + \frac{1}{2} ⌋,$ (23) and for $k \in {1, \dots, n}$ the $ϕ_{k}$ parameters are (24) $ϕ_{k} = {\begin{matrix} (k - 1 / 2) d - π & if k \leq i, \\ (k - 1 / 2) d + w - π & if k > i . \end{matrix}$ (24)

The obtained heuristic procedure has only 3 parameters to optimize: ω, p and w (see Algorithm 2). The SCV values computed by this heuristic optimization procedure are depicted in for $n \in {10, \dots, 74} .$ The figure indicates that for small n (n < 15) the procedure is inaccurate, but it is not a problem because the minimal SCV can be computed quickly in these cases. For larger n ( $n \geq 15$ ) the SCV provided by the heuristic procedure is less than twice the minimal SCV in the given range. Assuming that this ratio to the optimal SCV remains valid also for n > 74 the heuristic procedure which is applicable up to n = 1000, is an efficient tool to compute highly concentrated ME distributions for large order n values.

Figure 10. The minimal and the heuristic SCV as a function of order n in log-log scale.

Algorithm 2.

The objective function of the heuristic method

procedure ComputeSCV-heuristic( $ω, p, w$ )
Obtain the distance of zeros, d, and threshold i by Equation(23)(23) $d = \frac{2 π - w}{n}, i = ⌊ \frac{p - w / 2}{d} + \frac{1}{2} ⌋,$ (23)
Obtain $ϕ_{i}$ for $i \in {1, \dots, n}$ by Equation(24)(24) $ϕ_{k} = {\begin{matrix} (k - 1 / 2) d - π & if k \leq i, \\ (k - 1 / 2) d + w - π & if k > i . \end{matrix}$ (24)
Compute SCV by Algorithm 1
return SCV
end procedure

depicts the SCV obtained by the heuristic procedure for large n values, compared with the outputs of the highly accurate BIPOP-CMA-ES and the faster CMA-ES optimization procedures. suggests that the heuristic optimization remains very close to the minimum also for larger n values and the SCV obtained by the heuristic optimization maintains its polynomial decay between $n^{- 2.1}$ and $n^{- 2.2} .$

5.1. Implementation notes

The reader can validate the results of the paper based on the parameters we computed for a representative set of n values between 1 and 1000 and made available at https://github.com/ghorvath78/iltcme.^[Citation¹⁰^] The file itccme.json contains (among other parameters) the $a_{k}^{(n)}, b_{k}^{(n)}, c^{(n)}$ parameters for the computed set of n values from which $f^{+} (t)$ can be plotted using Equation(14)(14) $\begin{matrix} f^{+} (t) = & c^{(n)} e^{- t} + \frac{1}{2} \sum_{k = 1}^{n} (a_{k}^{(n)} + i b_{k}^{(n)}) e^{- (1 + i k ω) t} + (a_{k}^{(n)} - i b_{k}^{(n)}) e^{- (1 - i k ω) t}, \end{matrix}$ (14) (to check the non-negativity of $f^{+} (t)$ ), and the SCV can be computed using Equation(13)(13) $\begin{matrix} μ_{0} = c^{(n)} + \sum_{k = 1}^{n} \frac{a_{k}^{(n)} + b_{k}^{(n)} k ω}{1 + {(k ω)}^{2}}, \\ μ_{1} = c^{(n)} + \sum_{k = 1}^{n} \frac{a_{k}^{(n)} + 2 b_{k}^{(n)} k ω - a_{k}^{(n)} {(k ω)}^{2}}{{(1 + {(k ω)}^{2})}^{2}}, \\ μ_{2} = 2 c^{(n)} + \sum_{k = 1}^{n} \frac{2 a_{k}^{(n)} + 6 b_{k}^{(n)} k ω - 6 a_{k}^{(n)} {(k ω)}^{2} - 2 b_{k}^{(n)} {(k ω)}^{3}}{{(1 + {(k ω)}^{2})}^{3}} . \end{matrix}$ (13) in any general mathematical programing environments. Some related functions processing the parameters in itccme.json are also available at https://github.com/ghorvath78/iltcme^[Citation¹⁰^] in various programing environments. Additionally, the implementation of the optimization procedure at https://github.com/ghorvath78/cmefinder^[Citation⁹^] allows the reader to reproduce the parameters of the CME distributions presented in itccme.json.

The parameter setting of the code in https://github.com/ghorvath78/cmefinder^[Citation⁹^] resulted from the trends of the parameters depicted in and , which can also be projected to unexplored orders with some confidence. Our numerical experiments suggest that there are several local optima with similar SCV in the range of initial guesses and outside that range the obtained SCV is several orders of magnitude larger. That is, the optimization is sensitive to the initial guess, but with a proper initial guess it is insensitive to implementation details like the stopping criteria.

6. Conclusion

The paper presents a method to generate ME distributions with very low SCV. The hyper-trigonometric representation of the subset of ME distributions with exponential cosine-squared density function enables the efficient, explicit computation of the squared coefficient of variation. By selecting the appropriate numerical precision and a suitable numerical optimization method, we managed to create ME functions up to order n = 1000 with very low SCV ( $< 10^{- 6}$ for n = 1000). Such non-negative, low-SCV matrix exponential functions are important ingredients in several numerical procedures, including the numerical inverse Laplace transform and representing deterministic delay in stochastic models.

Note

Additional information

Funding

This work is partially supported by the OTKA K-123914 and the TUDFO/51757/2019-ITM grants.

Notes

1 In C++ we used the mpfr library for multi-precision computations.

References

Aldous, D.; Shepp, L. The least variable phase type distribution is Erlang. Stoch. Models 1987, 3, 467–473.
Google Scholar
Asmussen, S.; Bladt, M. Renewal theory and queueing algorithms for matrix-exponential distributions. In Matrix-Analytic Methods in Stochastic Models. Volume 183 of Lecture Notes in Pure and Applied Mathematics; Alfa, A. S., Chakravarthy, S. R., Eds.; New York: Marcel Dekker, 1996; pp 313–341.
Google Scholar
Asmussen, S.; O’Cinneide, C. A. Matrix-exponential distributions – Distributions with a rational Laplace transform. In Encyclopedia of Statistical Sciences; Kotz, S.; Read, C., Eds.; John Wiley & Sons: New York, 1997; pp 435–440.
Google Scholar
Éltető, T.; Rácz, S.; Telek, M. Minimal coefficient of variation of matrix exponential distributions. Presented at 2nd Madrid Conference on Queueing Theory, Madrid, Spain, Jul 2006.
Google Scholar
Hansen, N. The CMA evolution strategy: A comparing review. In Towards a New Evolutionary Computation; Lozano, J. A., Larrañaga, P., Inza, I., Bengoetxea, E., Eds.; Springer: Berlin, Heidelberg, 2006; pp 75–102.
Google Scholar
Hansen, N. Benchmarking a BI-population CMA-ES on the BBOB-2009 function testbed. Presented at Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers, ACM, Montreal, Québec, Canada, July 8–12, 2009, pp 2389–2396.
Google Scholar
Horváth, I.; Sáfár, O.; Telek, M.; Zámbó, B. Concentrated matrix exponential distributions. In European Workshop on Performance Engineering; Fiems, D., Paolieri, M., Platis, A. N., Eds.; Cham, Switzerland: Springer, 2016; pp 18–31.
Google Scholar
Horváth, I.; Talyigás, Z.; Telek, M. An optimal inverse Laplace transform method without positive and negative overshoot – An integral based interpretation. Electron. Notes Theor. Comput. Sci. 2018, 337, 87–104. DOI: 10.1016/j.entcs.2018.03.035.
Google Scholar
Optimization procedure for finding ME(n) with low SCV. https://github.com/ghorvath78/cmefinder [accessed Sep 22, 2019].
Google Scholar
Parameters of ME(n) distributions with low SCV. https://github.com/ghorvath78/iltcme [accessed Sep 22, 2019].
Google Scholar
Rechenberg, I. Evolutionsstrategien. In Simulationsmethoden in der Medizin und Biologie; Schneider, B., Ranft, U., Eds.; Springer-Verlag: Berlin, Heidelberg, 1978; pp 83–114.
Google Scholar

High order concentrated matrix-exponential distributions

Abstract

1. Introduction

2. Concentrated ME distributions

3. Efficient computation of the squared coefficient of variation