Full article: On bounding the Thompson metric by Schatten norms

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The Thompson metric provides key geometric insights in the study of non-linear matrix equations and in many optimization problems. However, knowing that an approximate solution is within $d_{T}$ units, in the Thompson metric, of the actual solution provides little insight into how good the approximation is as a matrix or vector approximation. That is, bounding the Thompson metric between an approximate and accurate solution to a problem does not provide obvious bounds either for the spectral or the Frobenius norm, both Schatten norms, of the difference between the approximation and accurate solution. This paper reports such an upper bound, namely that $∥ X - Y ∥_{p} \leq 2^{\frac{1}{p}} \frac{(e^{d} - 1)}{e^{d}} max [∥ X ∥_{p}, ∥ Y ∥_{p}]$ where ${∥\cdot∥}_{p}$ denotes the Schatten p-norm and $d$ denotes the Thompson metric between $X$ and $Y$ . Furthermore, a more geometric proof leads to a slightly better bound in the case of the Frobenius norm, $∥ X - Y ∥_{2} \leq \frac{(e^{d} - 1)}{\sqrt{e^{2 d} + 1}} \sqrt{∥ X {∥_{2}}^{2} + ∥ Y {∥_{2}}^{2}} \leq 2^{\frac{1}{2}} \frac{(e^{d} - 1)}{\sqrt{e^{2 d} + 1}} max [∥ X ∥_{p}, ∥ Y ∥_{p}]$ .

Keywords:

AMS Classification Codes:

PUBLIC INTEREST STATEMENT

Metrics are functions, mapping pairs of points to non-negative real numbers, which generalize the concept of distance to apply to abstract spaces. The Thompson metric provides critical geometric insights into dynamical systems, optimization problems and solving systems of equations. However, the Thompson metric is not an intuitive generalization of the concept of distance. The Thompson metric does provide an upper bound for more intuitive measurements of distance, such as those based on Schatten norms, but the currently known relation between Thompson metrics and more intuitive generalizations of distance is not always a tight bound. This paper presents a tighter bound relating metrics based on Schatten norms to Thompson metrics. The results in this paper can refine our geometrical understanding of problems arising in fluid mechanics, in geophysics as well as in robotics, and may improve assessments of the quality of data and image processing techniques.

1. Introduction

The Thompson metric is a variant of the Hilbert metric (Nussbaum & Walsh, Citation2004). The Hilbert metric generalizes the metric structure of hyperbolic geometry to the generalized concept of cones used in the study of Banach (complete normed vector) spaces, such as the space of Hermitian matrices. When applied to the unit disk, the Hilbert metric yields the Klein model of hyperbolic geometry, but when applied to a cone, such as the cone of positive definite or positive semidefinite matrices, the Hilbert metric is actually a pseudometric. A slight tweak of the Hilbert metric yields the Thompson (part) metric: the Thompson metric $d_{T} (X, Y)$ is the minimal $d_{T} = l o g (α)$ such that both $α X - Y$ and $α Y - X$ are both positive semidefinite. The Thompson metric is well defined over the cone of positive definite matrices but may be infinite when applied to other matrices, such as positive semidefinite matrices.

The Thompson metric (Lemmens & Roelands, Citation2015; Nussbaum & Walsh, Citation2004) provides key geometric insights into the study of non-linear matrix equations. In particular, many flows, which in other metrics may not even be contractions, have well-characterized contraction rates in the Thompson metric (Lee & Lim, Citation2008). That flows arising in many non-linear optimization, filtering and control problems are contractions in the Thompson metric (Carli & Sepulchre, Citation2015; Del Moral, Kurtzmann, & Tugaut, Citation2017; Gaubert & Qu, Citation2014; Lawson & Lim, Citation2007; Qu, Citation2014) endows this metric with great utility. Applications of the Thompson metric range from proofs of the existence and uniqueness of positive definite solutions for many types of non-linear equations (Liao, et al., Citation2010) to non-linear optimization theory (Gaubert & Qu, Citation2014; Montrucchio, Citation1998) and nonlinear Perron-Frobenius theory (Lemmens & Nussbaum, Citation2012; Nussbaum, Citation1988). Relatedly, matrix bounds in the Löwner order characterize the error in approximate solutions to continuous algebraic Riccati equation (Zhang & Liu, Citation2010).

While the Thompson metric is convenient for solving many optimization problems involving matrices, it is often more intuitive to view matrices solving such problems within more typical geometric contexts. Knowing that the solution of a problem $X$ and its n^th approximation $X_{n}$ are $d_{T}$ units apart in the Thompson metric provides little indication of how close $X_{n}$ is to $X$ , i.e. knowing that $X_{n} \leq α X$ and $X \leq α X_{n}$ in the Löwner ordering (Baksalary & Pukelsheim, Citation1991), where $α = e^{d_{T}}$ , does not intuitively bound $∥ X - X_{n} ∥$ for any of the usual matrix norms $∥ \cdot ∥$ . But it is $∥ X - X_{n} ∥$ in a suitable matrix norm, not $d_{T}$ , or similar expressions relating $X$ and $X_{n}$ in the Löwner ordering, that provides insight as to the quality of an approximation $X_{n}$ .

In particular, considering the matrices $X_{n}$ and $X$ as linear operators on Euclidean vector spaces, the spectral norm, i.e. a Schatten p-norm with $p = \infty$ , of $X - X_{n}$ is the relevant measure of how well $X_{n}$ approximates $X$ . Considering these matrices as themselves vectors in a Euclidean space, then the relevant assessment of how well $X_{n}$ approximates $X$ is the Frobenius norm, i.e. a Schatten norm with $p = 2$ , of $X - X_{n}$ . Therefore, it is useful to know an upper bound for the Schatten p-norm $∥ X - X_{n} ∥_{p}$ given some minimal information about $X$ (e.g. its norm) as well as the Thompson metric $d = d_{T} (X, X_{n})$ . For a cone with normality constant $δ$ , in a Banach Space, the following inequality holds (Lemmens & Nussbaum, Citation2012; Nussbaum, Citation1988): $∥ X - Y ∥\leq (1 + 2 δ) (e^{d_{T} (X, Y)} - 1) max [∥ X ∥, ∥ Y ∥]$ . However, this inequality does not preclude the existence of tighter bounds relating specific norms and Thompson metrics such as Schatten p-norms and the Thompson metric induced by the Löwner order on the cone of positive semidefinite matrices.

This paper thus seeks to fill this important gap in our understanding of the relationship between Thompson metrics and Schatten norms by providing an upper bound for the Schatten p-norm $∥ X - Y ∥_{p}$ given the Thompson metric $d = d_{T} (X, Y)$ as well as the Schatten p-norms of X and Y. In particular, the application of Weyl’s inequalities establishes that $∥ X - Y ∥_{p} \leq 2^{\frac{1}{p}} \frac{(e^{d} - 1)}{e^{d}} m a x [∥ X ∥_{p}, ∥ Y ∥_{p}]$ . Hopefully, this paper will serve as the beginning of a conversation leading to ever tighter bounds on $∥ X - Y ∥_{p}$ given $d = d_{T} (X, Y)$ as well as minimal information about $X$ and $Y$ , such as their norms and perhaps some knowledge of their spectra of eigenvalues.

2. Preliminaries

This paper will generally use a consistent set of letters and symbols to denote certain matrices and their norms and eigenvalues. Let $X$ and $Y$ each denote positive definite Hermitian matrices with eigenvalues $χ_{1} \geq . . . \geq χ_{n}$ and $υ_{1} \geq . . . \geq υ_{n}$ , respectively. While the proofs presented in this paper do not explicitly require the matrices be positive definite, in such cases the Thompson metric may be infinite, when the matrix is not positive definite the results presented here are trivial as any finite metric is $\leq \infty$ . Thus, this paper will focus on positive definite matrices $X$ and $Y$ . Denote the eigenvalues of the matrix $Δ = X - Y$ by $δ_{1} \geq . . . \geq δ_{n}$ and those of $E = - Δ = Y - X$ by $ε_{1} \geq . . . \geq ε_{n}$ . Note that $δ_{i} = - ε_{n - i + 1}$ . $∥ M ∥$ denotes a Schatten norm of the matrix $M$ and $∥ M ∥_{p}$ specifically denotes the Schatten p—norm (which is a norm for p such that $1 \leq p \leq \infty$ ). Note that $∥ M ∥_{p}$ is a function of the eigenvalues $μ_{1} \geq . . . \geq μ_{n}$ of $M$ : $∥ M ∥_{p} = f_{p} (μ_{1}, . . ., μ_{n}) = {(\sum_{n}^{i = 1} {|μ_{i}|}^{p})}^{1 / p}$ . Similarly, this paper will use the notation of $f (μ_{1}, . . ., μ_{n})$ as the functional form of $∥ M ∥$ . Depending on the context, $\leq$ and $\geq$ denote either the usual ordering on real numbers or the Löwner ordering on matrices: i.e. $X \leq Y$ indicates that $Y - X$ is positive semidefinite. In terms of the Löwner ordering, the Thompson metric $d_{T} (X, Y)$ is the minimal $d_{T} = l o g (α)$ such that $Y \leq α X$ and $X \leq α Y$ (Nussbaum & Walsh, Citation2004). As is standard, $t r (M)$ denotes the trace of the matrix $M$ .

Key to the proofs in this paper are the well-established Weyl’s inequalities (Bhatia, Citation2007; Weyl, Citation1949) for the eigenvalues of Hermitian matrices: let M, Y and P be Hermitian matrices such that $M = Y + P$ . Denote the eigenvalues of M, Y and P by $μ_{1} \geq \dots \geq μ_{n}$ , $ν_{1} \geq \dots \geq ν_{n}$ , and $ρ_{1} \geq \dots \geq ρ_{n}$ , respectively. Then, (Weyl’s inequalities) $ν_{i} + ρ_{n} \leq μ_{i} \leq ν_{i} + ρ_{1}$ .

Use of Mathematica (Wolfram Research I, Citation2016) proved invaluable in simplifying the equations and inequalities presented in this paper. Numerical results were calculated using MATLAB (MathWorks I, Citation2017).

3. Proof of general case

The proof begins with a lemma applying Weyl’s inequalities to bound the eigenvalues of $Δ = X - Y$ by the eigenvalues of Y given upper and lower bounds for X in the Löwner ordering. The second lemma, a consequence of the first lemma, bounds the eigenvalues of $Δ = X - Y$ by the eigenvalues of X .

Lemma 3.1: Consider (positive definite) Hermitian matrices X and Y such that X ≤ αY and X ≥ βY. Then (A) $(α - 1) \cdot υ_{i} \geq δ_{i} \geq (β - 1) \cdot υ_{i}$ and (B) $|δ_{i}| \leq m a x [|β - 1|, |α - 1|] \cdot υ_{i}$ .

Proof:

Note that $Δ = X - α Y + (α - 1) \cdot Y = X - β Y + (β - 1) \cdot Y$ . Let α₁ be the maximum eigenvalue of $X - α Y$ (which is negative semi-definite as $α Y - X$ is positive semidefinite by the definition of X ≤ αY) and β_n be the minimum eigenvalue of $X - β Y$ (which is positive semidefinite by the definition of X ≥ βY). By hypothesis, α₁ ≤ 0 and β_n ≥ 0. Note that the eigenvalues for $(α - 1) \cdot Y$ and $(β - 1) \cdot Y$ are $|α - 1| \cdot υ_{1}$ ,…, $|α - 1| \cdot υ_{n}$ and $|β - 1| \cdot υ_{1}$ ,…, $|β - 1| \cdot υ_{n}$ , respectively. By Weyl’s inequalities, we have $α_{1} + (α - 1) \cdot υ_{i} \geq δ_{i} \geq β_{n} + (β - 1) \cdot υ_{i}$ . Since α₁ ≤0 and β_n ≥ 0, we have $(α - 1) \cdot υ_{i} \geq δ_{i} \geq (β - 1) \cdot υ_{i}$ and hence $|δ_{i}| \leq m a x [|β - 1|, |α - 1|] \cdot υ_{i}$ .

Lemma 3.2: Again, consider (positive definite) Hermitian matrices X and Y such that X ≤ αY and X ≥ βY. Then (A) $(\frac{1}{β} - 1) \cdot χ_{i} \geq δ_{i} \geq (\frac{1}{α} - 1) \cdot χ_{i}$ and (B) $|δ_{i}| \leq m a x [(\frac{1}{α} - 1), (\frac{1}{β} - 1)] \cdot χ_{i}$ .

Proof:

X ≤ αY and X ≥ βY respectively imply $\frac{1}{α} X \leq Y$ and $\frac{1}{β} X \geq Y$ . Apply Lemma 3.1 to Y (in place of X), X (in place of Y), $\frac{1}{α}$ (in place of β) and $\frac{1}{β}$ (in place of α).

Theorem 3.3: Consider (positive definite) Hermitian matrices X and Y such that X ≤ αY and X ≥ βY. $∥ X - Y ∥\leq m i n \{m a x [|α - 1|, |β - 1|] \cdot ∥ Y ∥, m a x [|\frac{1}{α} - 1|, |\frac{1}{β} - 1|] \cdot ∥ X ∥\}$

Proof:

Consider two sets of eigenvalues, $λ_{1} \geq \dots \geq λ_{n}$ and $μ_{1} \geq \dots \geq μ_{n}$ such that $λ_{i} \leq μ_{i}$ . Since f, a functional form of a Schatten norm, is monotonic in each variable, λ_i ≤ μ_i implies $f (λ_{1}, \dots, λ_{i}, \dots, λ_{n}) \leq f (μ_{1}, \dots, μ_{i}, \dots, μ_{n})$ . Given that implication and given that $f (γ μ_{1}, \dots, γ μ_{i}, \dots, γ μ_{n}) = γ f (μ_{1}, \dots, μ_{i}, \dots, μ_{n})$ , $|δ_{i}| \leq m a x [|β - 1|, |α - 1|] \cdot υ_{i}$ , which is given by part (B) of Lemma 3.1, implies that $f (δ_{1}, \dots, δ_{n}) \leq m a x [|α - 1|, |β - 1|] \cdot (υ_{1}, \dots, υ_{n})$ . Similarly part (B) of Lemma 3.2 yields $|δ_{i}| \leq m a x [|\frac{1}{β} - 1|, |\frac{1}{α} - 1|] \cdot χ_{i}$ , which implies that $f (δ_{1}, \dots, δ_{n}) \leq m a x [|\frac{1}{β} - 1|, |\frac{1}{α} - 1|] \cdot f (χ_{1}, \dots, χ_{n})$ . Combining these two inequalities for $f (δ_{1}, \dots, δ_{n})$ with the definition of $f (μ_{1}, \dots, μ_{n}) =∥ M ∥$ yields $∥ X - Y ∥\leq m i n \{m a x [|α - 1|, |β - 1|] \cdot ∥ Y ∥, m a x [|\frac{1}{α} - 1|, |\frac{1}{β} - 1|] \cdot ∥ X ∥\}$

Theorem 3.4: Consider (positive definite) Hermitian matrices X and Y such that Thompson metric $d = d_{T} (X, Y)$ is finite. Let $λ_{1}, \dots, λ_{n}$ denote a collection of numbers such that $|λ_{i}| = m a x \{min [|χ_{i}|, |υ_{i}|], min [|χ_{n - i + 1}|, |υ_{n - i + 1}|]\} \cdot \frac{e^{d} - 1}{e^{d}}$ . Then $∥ X - Y ∥\leq f (λ_{1}, \dots, λ_{n})$

Proof:

Let $α = e^{d}$ . By definition of the Thompson metric, we have (i) X ≤ αY and (ii) Y ≤ αX. Let $β = α^{- 1} = e^{- d}$ . Then we have (iii) βX ≤ Y and (iv) βY ≤ X. Applying Lemma 3.1, part A to (i) and (iii) yields $(α - 1) \cdot υ_{i} \geq δ_{i} \geq (β - 1) \cdot υ_{i}$ . Noting that $β = \frac{1}{α}$ and $α = \frac{1}{β}$ , applying Lemma 3.2, part A to (i) and (iii) yields $(α - 1) \cdot χ_{i} \geq δ_{i} \geq (β - 1) \cdot χ_{i}$ . Similarly, recall that $E = Y - X$ , so reversing the roles of X and Y when applying Lemma 3.1, part A, and respectively Lemma 3.2, part A., to (ii) and (iv) yields $(α - 1) \cdot χ_{i} \geq ε_{i} \geq (β - 1) \cdot χ_{i}$ and yields $(α - 1) \cdot υ_{i} \geq ε_{i} \geq (β - 1) \cdot υ_{i}$ , respectively. Since $δ_{i} = - ε_{n - i + 1}$ , application of Lemmas 3.1 and 3.2 (part A) to (ii) and (iv) yield $- (α - 1) \cdot χ_{n - i + 1} \leq δ_{i} \leq - (β - 1) \cdot χ_{n - i + 1}$ and $- (α - 1) \cdot υ_{n - i + 1} \leq δ_{i} \leq - (β - 1) \cdot υ_{n - i + 1}$ . Since (by the definition of the Thompson metric) α ≥ 1 and hence $(α - 1) \geq (1 - β) = \frac{α - 1}{α}$ , $(β - 1) \cdot χ_{i} \leq δ_{i} \leq (1 - β) \cdot χ_{n - i + 1}$ and $(β - 1) \cdot υ_{i} \leq δ_{i} \leq (1 - β) \cdot υ_{n - i + 1}$ .

Substituting $(1 - β) = \frac{α - 1}{α} = \frac{e^{d} - 1}{e^{d}}$ . Thus, $|δ_{i}| \leq m a x {min [|χ_{i}|, |υ_{i}|], min [|χ_{n - i + 1}|, |υ_{n - i + 1}|] \cdot \frac{e^{d} - 1}{e^{d}} = |λ_{i}|$ . As in the proof of Theorem 3.3, the monotonicity of $f$ and definition of $|δ_{i}|$ yield $∥ X - Y ∥\leq f (λ_{1}, \dots, λ_{n})$ .

Theorem 3.5: Consider (positive definite) Hermitian matrices X and Y, i.e. such that Thompson metric $d = d_{T} (X, Y)$ is finite. Then $∥ X - Y ∥_{p} \leq 2^{\frac{1}{p}} \frac{(e^{d} - 1)}{e^{d}} max [∥ X ∥_{p}, ∥ Y ∥_{p}]$ .

Proof:

Note that $m a x \{min [|χ_{i}|, |υ_{i}|], min [|χ_{n - i + 1}|, |υ_{n - i + 1}|]\} \leq m a x {max [|χ_{i}|, |υ_{i}|], max [|χ_{n - i + 1}|, |υ_{n - i + 1}|] = max [|χ_{i}|, |υ_{i}|, |χ_{n - i + 1}|, |υ_{n - i + 1}|] = m a x \{max [|χ_{i}|, |χ_{n - i + 1}|], max [|υ_{i}|, |υ_{n - i + 1}|]\} .$

Since raising positive numbers to powers $\geq 1$ is monotonically increasing, a consequence of Theorem 3.4 is that ${|δ_{i}|}^{p} \leq \frac{(e^{d} - 1)}{e^{d}} \cdot m a x \{max [{|χ_{i}|}^{p}, {|χ_{n - i + 1}|}^{p}], max [{|υ_{i}|}^{p}, {|υ_{n - i + 1}|}^{p}]\}$ , which in turn is $\leq \frac{(e^{d} - 1)}{e^{d}} \cdot m a x \{{|χ_{i}|}^{p} + {|χ_{n - i + 1}|}^{p}, {|υ_{i}|}^{p} + {|υ_{n - i + 1}|}^{p}\}$ . Hence, by the definition of and monotonicity of $f_{p}$ , we have $∥ Δ {∥_{p}}^{p} \leq {[\frac{(e^{d} - 1)}{e^{d}}]}^{p} \cdot m a x \{\sum_{n}^{i = 1} {|χ_{i}|}^{p} + \sum_{n}^{i = 1} {|χ_{n - i + 1}|}^{p}, \sum_{n}^{i = 1} {|υ_{i}|}^{p} + \sum_{n}^{i = 1} {|υ_{n - i + 1}|}^{p}\}$ . Since $\sum_{i = 1}^{n} {|χ_{i}|}^{p} = \sum_{i = 1}^{n} {|χ_{n - i + 1}|}^{p}$ and $\sum_{i = 1}^{n} {|υ_{i}|}^{p} = \sum_{i = 1}^{n} {|υ_{n - i + 1}|}^{p}$ , $∥ Δ {∥_{p}}^{p} \leq {[\frac{(e^{d} - 1)}{e^{d}}]}^{p} \cdot m a x \{2 \sum_{n}^{i = 1} {|χ_{i}|}^{p}, 2 \sum_{n}^{i = 1} {|υ_{i}|}^{p}\}$ , which by definition of the Schatten p-norm yields $∥ X - Y {∥_{p}}^{p} \leq 2 \cdot {[\frac{(e^{d} - 1)}{e^{d}}]}^{p} max [∥ X {∥_{p}}^{p}, ∥ Y {∥_{p}}^{p}]$ . Taking the p’th root of both sides of the inequality yields the result.

4. The Frobenius (p = 2) case

We begin by noting that $t r (A^{T} B)$ defines an inner product yielding the Frobenius norm, i.e. $∥ A ∥_{2} = \sqrt{t r (A^{T} A)}$ . This, together with the commutative property of the trace, leads to the following version of the law of cosines for matrices: $∥ A - B {∥_{2}}^{2} =∥ A {∥_{2}}^{2} + ∥ B {∥_{2}}^{2} - 2 \cdot t r (A^{T} B)$ . Since for two (symmetric) positive semidefinite matrices $X$ and $Y$ , $t r (X^{T} Y) = t r (X Y) \geq 0$ (Yang, Citation2000; Yang, Yang, & Teo, Citation2001), $θ = c o s^{- 1} [\frac{t r (X Y)}{∥ X ∥_{2} ∥ Y ∥_{2}}] \leq \frac{π}{2} (r a d)$ and hence $∥ X - Y {∥_{2}}^{2} \leq∥ X {∥_{2}}^{2} + ∥ Y {∥_{2}}^{2}$ . Note that the Frobenius norm of a matrix is the same as the Euclidean norm of that matrix reshaped as a vector, so matrices under the Frobenius norm can be treated just as vectors in a Euclidean space.

Let $d = d_{T} (X, Y)$ be the Thompson metric between (positive definite) matrices $X$ and $Y$ and let $α = e^{d}$ . Note that in Figure , which represents each matrix as a vector, all the vectors shown are coplanar. The angle $θ$ is the same as the angle between $α X - Y = x + z$ and $α Y - X = y + w$ . Since, by definition of the Thompson metric, $α X - Y$ and $α Y - X$ are both positive semidefinite, $θ \leq \frac{π}{2} (r a d)$ and $t r (x y)$ and $t r (w z) \geq 0$ whereas $t r (x w)$ and $t r (y z) \leq 0$ . Thus, we have

(1)

∥ Δ {∥_{2}}^{2} \leq∥ x {∥_{2}}^{2} + ∥ y {∥_{2}}^{2}

(1)

(2)

∥ α Δ {∥_{2}}^{2} \leq∥ w {∥_{2}}^{2} + ∥ z {∥_{2}}^{2}

(2)

(3)

(α - 1) {X_{2}}^{2} \geq∥ y {∥_{2}}^{2} + ∥ z {∥_{2}}^{2}

(3)

(4)

(α - 1) Y_{2}^{2} \geq∥ w ∥_{2}^{2} + ∥ x ∥_{2}^{2} .

(4)

Figure 1. Relation of the Thompson metric to the Frobenius norm. This figure represents matrices X and Y as vectors that span a plane and illustrates the geometric intuition behind inequalities (1)–(4).

Adding inequalities (1) and (2) as well as (3) and (4), we have

∥ Δ {∥_{2}}^{2} + ∥ α Δ {∥_{2}}^{2} \leq∥ x {∥_{2}}^{2} + ∥ y {∥_{2}}^{2} + ∥ w {∥_{2}}^{2} + ∥ z {∥_{2}}^{2} \leq

(5)

(α - 1) X_{2}^{2} + (α - 1) Y_{2}^{2} .

(5)

Thus

(6)

∥ Δ ∥_{2}^{2} + ∥ α Δ ∥_{2}^{2} \leq (α - 1) X_{2}^{2} + ∥ (α - 1) Y ∥_{2}^{2} .

(6)

Solving for $∥ Δ ∥_{2}$ , we have our result

(7)

∥ Δ ∥_{2} \leq \frac{α - 1}{\sqrt{1 + α^{2}}} \sqrt{∥ X ∥_{2}^{2} + ∥ Y ∥_{2}^{2}} .

(7)

5. A generalization of the Frobenius (p = 2) case

Consider the related and more general problem of bounding the Frobenius norm $∥ Δ ∥_{2}$ , with $Δ = X - Y$ , given matrix bounds $X \leq g \cdot Y$ and $Y \leq f \cdot X$ , for scalars $f$ and $g$ . This generalization, illustrated in Figure , yields the following equations:

(8)

∥ Δ {∥_{2}}^{2} =∥ X {∥_{2}}^{2} + ∥ Y {∥_{2}}^{2} - 2 ∥ X ∥_{2} ∥ Y ∥_{2} \cdot c o s ϕ

(8)

∥ Δ_{f g}_{2}^{2} =∥ f \cdot X {∥_{2}}^{2} + ∥ g \cdot Y {∥_{2}}^{2} - 2 ∥ f \cdot X ∥_{2} ∥ g \cdot Y ∥_{2} \cdot c o s ϕ = \dots

(9)

f^{2} \cdot ∥ X ∥_{2}^{2} + g^{2} \cdot ∥ Y ∥_{2}^{2} - 2 f g ∥ X ∥_{2} ∥ Y ∥_{2} \cdot c o s ϕ .

(9)

Figure 2. Difference and Frobenius norm between two vectors give matrix bounds. This figure represents matrices X and Y as vectors that span a plane and illustrates the geometric intuition behind Equations (8) and (9) as well as inequality (10).

Similar to the argument above (in part 4), $X \leq g \cdot Y$ and $Y \leq f \cdot X$ imply that $θ \leq \frac{π}{2} (r a d)$ , which implies

∥ Δ {∥_{2}}^{2} + \cdot_{f g}_{2}^{2} \leq {(∥ g \cdot Y ∥_{2} - ∥ Y ∥_{2})}^{2} + {(∥ f \cdot X ∥_{2} - ∥ X ∥_{2})}^{2} = \dots

(10)

{(f - 1)}^{2} \cdot ∥ X ∥_{2}^{2} + {(g - 1)}^{2} \cdot ∥ Y ∥_{2}^{2} .

(10)

The above system of two quadratic equations and one quadratic inequality has (assuming $∥ X ∥_{2}$ is known, even if $X$ is an unknown, approximated by $Y$ ) three unknowns: $∥ Δ ∥_{2}$ , $∥ Δ_{f g} ∥_{2}$ and $c o s ϕ$ . Solving this system and simplifying the resulting solutions with Mathematica (Wolfram Research I, Citation2016) yields the following inequalities:

(11)

∥ Δ ∥_{2} \leq \sqrt{\frac{(1 + f (g - 2)) \cdot ∥ X {∥_{2}}^{2} + (1 + g (f - 2)) \cdot ∥ Y {∥_{2}}^{2}}{1 + f \cdot g}}

(11)

(12)

∥ Δ_{f g} {∥_{2}}^{2} \leq \sqrt{\frac{[f^{2} \cdot ((f - 2) \cdot g + 1)] \cdot ∥ X {∥_{2}}^{2} + [g^{2} \cdot ((g - 2) \cdot f + 1)] \cdot ∥ Y {∥_{2}}^{2}}{1 + f \cdot g}}

(12)

(13)

c o s ϕ \geq \frac{f \cdot ∥ X ∥_{2}^{2} + g \cdot ∥ Y ∥_{2}^{2}}{1 + f \cdot g \cdot ∥ X ∥_{2} \cdot ∥ Y ∥_{2}} .

(13)

Note that this not only establishes a bound for $∥ X - Y ∥_{2}$ , given matrix bounds $X \leq g \cdot Y$ and $Y \leq f \cdot \sqrt{b^{2} - 4 a c} \cdot X$ , but this analysis also yields a bound for $c o s ϕ$ . Thus, this analysis provides information about the inner product between $X$ and $Y$ , even in cases where $X$ is an unknown, approximated by $Y$ . Of course, when $f = g = α = e^{d}$ , (11) simplifies to the result established in Section 4.

6. Numerical results

Fifty calculations, performed in MATLAB, with pairs of random positive definite $5 \times 5$ matrices tested the tightness of the bounds presented in this paper. The following formulas generated the i^th pair $(X_{i}, Y_{i})$ of matrices

(14)

X_{i} = (A + (i^{2} / 100) \cdot B) {(A + (i^{2} / 100) \cdot B)}^{T} + D

(14)

(15)

Y_{i} = (A + (i^{2} / 100) \cdot C) {(A + (i^{2} / 100) \cdot C)}^{T} + D,

(15)

where A, B and C have elements randomly drawn from the uniform distribution on [0,1] and D is a diagonal matrix with diagonal elements randomly drawn from that same distribution. The occurrence of the index i in the formula ensured a range of distances among the 50 matrix pairs tested.

Figure compares values of $X_{i} - {Y_{i}}_{p}$ for (A) p = 1 (trace norm), (B) p = 2 (Frobenius norm) and (C) p = ∞ (spectral norm) with bounds for those values calculated using Theorem 3.5, (and for panel B) Equation (7) and Equation (11). The function, thompson_metric.m, used to calculate the Thompson metric as well as f and g in Equation (11) is available via MATLAB Central File Exchange, and the script, and data calculated using that script, used to generate Figure is available from the author upon request. While the bounds described in this paper are clearly not very tight (for matrices more distant from each other), hopefully, these results will spark further research leading to tighter bounds on Schatten norms based on the Thompson metric.

Figure 3. Comparison, for 50 pairs of matrices, of values of bounds for Schatten metrics derived in this paper vs. the values of the corresponding Schatten metric: (A) bound given by Theorem. 3.5 for the trace norm vs. the trace norm itself, (B) bounds for the Frobenius norm vs. the Frobenius norm itself and (C) bound given by Theorem. 3.5 for the spectral norm vs. the spectral norm itself. In panel (B), bounds given by Theorem. 3.5 are indicated with cyan markers, bounds given by Equation (7) are indicated with magenta markers, and bounds given by Equation (11) are indicated with black markers.

7. Discussion

Weyl’s inequalities, and hence some knowledge of the spectra of $X$ and $Y$ , form the backbone of the proofs presented above. In the motivating case where $Y$ is an approximation of an unknown $X$ , the spectrum of $X$ may also be unknown. While the principle result of this paper ultimately only requires knowledge of $∥ X ∥_{p}$ (as well as $∥ Y ∥_{p}$ , which is generally known), purely geometric/trigonometric proofs, such as the one given for the Frobenius case, of the results presented in this paper would be more elegant given the nature of the motivating problem.

Furthermore, proofs not based on the matrix structure of $X$ and $Y$ but based purely on the ordering (Löwner ordering in this case) and norm (Schatten p-norm) being compared might allow for tighter bounds on $∥ X - Y ∥_{p}$ even in the absence of any knowledge of the spectrum of $X$ (or even of $Y$ , for that matter), other than perhaps a restriction that $X$ and $Y$ be positive semidefinite. In comparison, Theorem 3.4 provides a tighter bound on $∥ X - Y ∥_{p}$ than the main result (Theorem 3.5), but it requires some knowledge of the spectrum of $X$ (at least that its eigenvalues are lower in magnitude than the corresponding eigenvalues of $Y$ ).

Additionally, proofs not based on the matrix structure of $X$ and $Y$ may lead to the generalization of these results in other orderings, which can also induce Thompson metrics (Cobzaş & Rus, Citation2014), and other norms. For instance, since the Frobenius norm arises from an inner product, a geometrically flavored argument leads to a slightly tighter bound on $∥ X - Y ∥_{2}$ than obtained from the general bound for $∥ X - Y ∥_{p}$ and setting $p = 2$ . On the other hand, the already established general result for a Thompson metric induced by a normal cone in a Banach space (Lemmens & Nussbaum, Citation2012; Nussbaum, Citation1988) is not as tight as the main result (Theorem 3.5) presented here: as $ϵ \to 0,$ the value of $δ$ such that $0 \leq X \leq (X + ϵ I) \Rightarrow∥ X ∥\leq δ ∥ X + ϵ I ∥$ approaches unity; thus, the normality constant for the cone of positive semidefinite matrices is unity, and the general result for Banach spaces reduces to $∥ X - Y ∥\leq 3 (e^{d_{T} (X, Y)} - 1) max [∥ X ∥, ∥ Y ∥]$ , the right-hand side of which inequality is clearly greater than $2^{\frac{1}{p}} \frac{(e^{d} - 1)}{e^{d}} max [∥ X ∥_{p}, ∥ Y ∥_{p}]$ since $2^{\frac{1}{p}} \leq 2 < 3$ and $e^{d} \geq 1$ since the (Thompson) metric $d$ is non-negative.

As illustrated in Section 5 of this paper, more general analysis of the Frobenius case yields not only a bound for $∥ X - Y ∥_{2}$ but also bounds inner product between $X$ and $Y$ . In the case where $X$ is an unknown, approximated by $Y$ , bounds on the inner product between $X$ and $Y$ further quantify how well $Y$ approximates $X$ , and may provide further insight into improved approximations of an unknown $X$ .

Hopefully, future research can further generalize the analysis presented in Section 5 to cases where $X \leq g (Y)$ and $Y \leq f (X)$ , for more general classes of functions on $X$ and $Y$ than mere scalar multiplication. Such inequalities, in the Löwner order, arise, for example, in characterizing approximate solutions to the continuous algebraic Riccati equation (Zhang & Liu, Citation2010). Further generalization of the results presented here will facilitate expressing the quality of approximations, found in many areas of matrix algebra and optimization theory, in terms of geometrically intuitive metrics based on Schatten norms rather than less geometrically intuitive bounds in the Löwner order.

Acknowledgements

This work was made possible by the support of William Paterson University of NJ’s Office of the Provost for Assigned Release Time for research. The author also thanks Rajendra Bhatia as well as Yongdo Lim for their advice about where to submit this work and an anonymous reviewer for referring me to Lemmens’ and Nussbaum’s related results in nonlinear Perron-Frobenius theory.

Disclosure Statement

The author has no competing interests to declare.

Additional information

Funding

This work was not funded by any external granting agency or funding source. The author thanks the Department of Chemistry at William Paterson University for providing funding to maintain the author’s MATLAB license.

Notes on contributors

David A. Snyder

David A. Snyder is a professor in the Department of Chemistry at William Paterson University of NJ. He received a Bachelor of Science – double majoring in Mathematics and Biology – from the University of California, Irvine and a Ph.D. in Biochemistry from Rutgers University. Prior to starting his current position, Dr. Snyder was a post-doctoral researcher in Rafael Brüschweiler’s (then at Florida State University) research group. Dr. Snyder’s research interests include quantifying protein flexibility and covariance NMR, which comprises a family of techniques that use algebraic operations to enhance and combine NMR spectra. In the course of researching the accuracy of covariance NMR, Dr. Snyder found that many relevant algebraic results involved Thompson metrics rather than more intuitive metrics based on Schatten norms. The sparsity of mathematical literature relating Thompson metrics and Schatten norms prompted Dr. Snyder to prove the results presented in this paper.

References

Baksalary, J. K., & Pukelsheim, F. (1991). On the Löwner, minus, and star partial orderings of nonnegative definite matrices and their squares. Linear Algebra and Its Applications, 151, 135–10. doi:10.1016/0024-3795(91)90359-5
Web of Science ®Google Scholar
Bhatia, R. (2007). Perturbation bounds for matrix eigenvalues. Philadelphia, PA: SIAM.
Google Scholar
Carli, F. P., & Sepulchre, R. (2015). On the projective geometry of Kalman filter. Proceedings of the 54th Annual IEEE Conference on Decision and Control (CDC) (2420–2425). IEEE Online: IEEE-Xplore 2016 doi:10.1109/CDC.2015.7402570.
Google Scholar
Cobzaş, Ş. & Rus, M. (2014). Normal cones and Thompson metric. In T. Rassias & L. Tóth (Eds.), Topics in mathematical analysis and applications. Springer optimization and its applications (Vol. 94, pp. 209–258). New York, NY: Springer.
Google Scholar
Del Moral, P., Kurtzmann, A., & Tugaut, J. (2017). On the stability and the uniform propagation of chaos of a class of extended ensemble Kalman-Bucy filters. SIAM Journal on Control and Optimization, 55, 119–155. doi:10.1137/16M1087497
Web of Science ®Google Scholar
Gaubert, S., & Qu, Z. (2014). The contraction rate in Thompson’s part metric of order-preserving flows on a cone – Application to generalized Riccati equations. Journal of Differential Equations, 256(8), 2902–2948. doi:10.1016/j.jde.2014.01.024
Web of Science ®Google Scholar
Lawson, J., & Lim, Y. (2007). A Birkhoff contraction formula with applications to Riccati equations. SIAM Journal on Control and Optimization, 46(3), 930–951. doi:10.1137/050637637
Web of Science ®Google Scholar
Lee, H., & Lim, Y. (2008). Invariant metrics, contractions and nonlinear matrix equations. Nonlinearity, 21(4), 857. doi:10.1088/0951-7715/21/4/011
Web of Science ®Google Scholar
Lemmens, B., & Nussbaum, R. D. (2012). Nonlinear Perron-Frobenius theory. (Cambridge Tracts in Mathematics; 189). New York (NY): Cambridge University Press.
Google Scholar
Lemmens, B., & Roelands, M. (2015). Unique geodesics for Thompson’s metric [Les géodésiques uniques de la métrique de Thompson]. In: Annales De L’institut Fourier, 65, 315–348.
Web of Science ®Google Scholar
Liao, A., Yao, G., & Duan, X. (2010). Thompson metric method for solving a class of nonlinear matrix equation. Applied Mathematics and Computation, 216(6), 1831–1836. doi:10.1016/j.amc.2009.12.022
Web of Science ®Google Scholar
MathWorks I. MATLAB. MathWorks; 2017.
Google Scholar
Montrucchio, L. (1998). Thompson metric, contraction property and differentiability of policy functions. Journal of Economic Behavior & Organization, 33(3), 449–466. doi:10.1016/S0167-2681(97)00069-3
Web of Science ®Google Scholar
Nussbaum, R. D. (1988). Hilbert’s projective metric and iterated nonlinear maps. (AMS Memoirs; Vol. 75, Number 391). Providence (RI): American Mathematical Society.
Google Scholar
Nussbaum, R. D., & Walsh, C. (2004). A metric inequality for the Thompson and Hilbert geometries. Journal of Inequalities in Pure and Applied Mathematics, 5. http://emis.ams.org/journals/JIPAM/images/131_03_JIPAM/131_03_www.pdf
Google Scholar
Qu, Z. (2014). Contraction of Riccati flows applied to the convergence analysis of a max-plus curse-of-dimensionality–Free method. SIAM Journal on Control and Optimization, 52(5), 2677–2706. doi:10.1137/130906702
Web of Science ®Google Scholar
Weyl, H. (1949). Inequalities between the two kinds of eigenvalues of a linear transformation. Proceedings of the National Academy of Sciences of the United States of America, 35(7), 408–411. doi:10.1073/pnas.35.7.408
Web of Science ®Google Scholar
Wolfram Research I. (2016). Mathematica. [Internet]. Champaign, Illinois: Wolfram Research, Inc.
Google Scholar
Yang, X. (2000). A matrix trace inequality. Journal of Mathematical Analysis and Applications, 250(1), 372–374. doi:10.1006/jmaa.2000.7068
Web of Science ®Google Scholar
Yang, X. M., Yang, X. Q., & Teo, K. L. (2001). A matrix trace inequality. Journal of Mathematical Analysis and Applications, 263(1), 327–331. doi:10.1006/jmaa.2001.7613
Web of Science ®Google Scholar
Zhang, J., & Liu, J. (2010). Matrix bounds for the solution of the continuous algebraic Riccati equation. Mathematical Problems in Engineering, 2010, 1–15. doi:10.1155/2010/819064
Web of Science ®Google Scholar

On bounding the Thompson metric by Schatten norms