![MathJax Logo](/templates/jsp/_style2/_tandf/pb2/images/math-jax.gif)
Abstract
The Thompson metric provides key geometric insights in the study of non-linear matrix equations and in many optimization problems. However, knowing that an approximate solution is within units, in the Thompson metric, of the actual solution provides little insight into how good the approximation is as a matrix or vector approximation. That is, bounding the Thompson metric between an approximate and accurate solution to a problem does not provide obvious bounds either for the spectral or the Frobenius norm, both Schatten norms, of the difference between the approximation and accurate solution. This paper reports such an upper bound, namely that
where
denotes the Schatten p-norm and
denotes the Thompson metric between
and
. Furthermore, a more geometric proof leads to a slightly better bound in the case of the Frobenius norm,
.
PUBLIC INTEREST STATEMENT
Metrics are functions, mapping pairs of points to non-negative real numbers, which generalize the concept of distance to apply to abstract spaces. The Thompson metric provides critical geometric insights into dynamical systems, optimization problems and solving systems of equations. However, the Thompson metric is not an intuitive generalization of the concept of distance. The Thompson metric does provide an upper bound for more intuitive measurements of distance, such as those based on Schatten norms, but the currently known relation between Thompson metrics and more intuitive generalizations of distance is not always a tight bound. This paper presents a tighter bound relating metrics based on Schatten norms to Thompson metrics. The results in this paper can refine our geometrical understanding of problems arising in fluid mechanics, in geophysics as well as in robotics, and may improve assessments of the quality of data and image processing techniques.
1. Introduction
The Thompson metric is a variant of the Hilbert metric (Nussbaum & Walsh, Citation2004). The Hilbert metric generalizes the metric structure of hyperbolic geometry to the generalized concept of cones used in the study of Banach (complete normed vector) spaces, such as the space of Hermitian matrices. When applied to the unit disk, the Hilbert metric yields the Klein model of hyperbolic geometry, but when applied to a cone, such as the cone of positive definite or positive semidefinite matrices, the Hilbert metric is actually a pseudometric. A slight tweak of the Hilbert metric yields the Thompson (part) metric: the Thompson metric is the minimal
such that both
and
are both positive semidefinite. The Thompson metric is well defined over the cone of positive definite matrices but may be infinite when applied to other matrices, such as positive semidefinite matrices.
The Thompson metric (Lemmens & Roelands, Citation2015; Nussbaum & Walsh, Citation2004) provides key geometric insights into the study of non-linear matrix equations. In particular, many flows, which in other metrics may not even be contractions, have well-characterized contraction rates in the Thompson metric (Lee & Lim, Citation2008). That flows arising in many non-linear optimization, filtering and control problems are contractions in the Thompson metric (Carli & Sepulchre, Citation2015; Del Moral, Kurtzmann, & Tugaut, Citation2017; Gaubert & Qu, Citation2014; Lawson & Lim, Citation2007; Qu, Citation2014) endows this metric with great utility. Applications of the Thompson metric range from proofs of the existence and uniqueness of positive definite solutions for many types of non-linear equations (Liao, et al., Citation2010) to non-linear optimization theory (Gaubert & Qu, Citation2014; Montrucchio, Citation1998) and nonlinear Perron-Frobenius theory (Lemmens & Nussbaum, Citation2012; Nussbaum, Citation1988). Relatedly, matrix bounds in the Löwner order characterize the error in approximate solutions to continuous algebraic Riccati equation (Zhang & Liu, Citation2010).
While the Thompson metric is convenient for solving many optimization problems involving matrices, it is often more intuitive to view matrices solving such problems within more typical geometric contexts. Knowing that the solution of a problem and its nth approximation
are
units apart in the Thompson metric provides little indication of how close
is to
, i.e. knowing that
and
in the Löwner ordering (Baksalary & Pukelsheim, Citation1991), where
, does not intuitively bound
for any of the usual matrix norms
. But it is
in a suitable matrix norm, not
, or similar expressions relating
and
in the Löwner ordering, that provides insight as to the quality of an approximation
.
In particular, considering the matrices and
as linear operators on Euclidean vector spaces, the spectral norm, i.e. a Schatten p-norm with
, of
is the relevant measure of how well
approximates
. Considering these matrices as themselves vectors in a Euclidean space, then the relevant assessment of how well
approximates
is the Frobenius norm, i.e. a Schatten norm with
, of
. Therefore, it is useful to know an upper bound for the Schatten p-norm
given some minimal information about
(e.g. its norm) as well as the Thompson metric
. For a cone with normality constant
, in a Banach Space, the following inequality holds (Lemmens & Nussbaum, Citation2012; Nussbaum, Citation1988):
. However, this inequality does not preclude the existence of tighter bounds relating specific norms and Thompson metrics such as Schatten p-norms and the Thompson metric induced by the Löwner order on the cone of positive semidefinite matrices.
This paper thus seeks to fill this important gap in our understanding of the relationship between Thompson metrics and Schatten norms by providing an upper bound for the Schatten p-norm given the Thompson metric
as well as the Schatten p-norms of X and Y. In particular, the application of Weyl’s inequalities establishes that
. Hopefully, this paper will serve as the beginning of a conversation leading to ever tighter bounds on
given
as well as minimal information about
and
, such as their norms and perhaps some knowledge of their spectra of eigenvalues.
2. Preliminaries
This paper will generally use a consistent set of letters and symbols to denote certain matrices and their norms and eigenvalues. Let and
each denote positive definite Hermitian matrices with eigenvalues
and
, respectively. While the proofs presented in this paper do not explicitly require the matrices be positive definite, in such cases the Thompson metric may be infinite, when the matrix is not positive definite the results presented here are trivial as any finite metric is
. Thus, this paper will focus on positive definite matrices
and
. Denote the eigenvalues of the matrix
by
and those of
by
. Note that
.
denotes a Schatten norm of the matrix
and
specifically denotes the Schatten p—norm (which is a norm for p such that
). Note that
is a function of the eigenvalues
of
:
. Similarly, this paper will use the notation of
as the functional form of
. Depending on the context,
and
denote either the usual ordering on real numbers or the Löwner ordering on matrices: i.e.
indicates that
is positive semidefinite. In terms of the Löwner ordering, the Thompson metric
is the minimal
such that
and
(Nussbaum & Walsh, Citation2004). As is standard,
denotes the trace of the matrix
.
Key to the proofs in this paper are the well-established Weyl’s inequalities (Bhatia, Citation2007; Weyl, Citation1949) for the eigenvalues of Hermitian matrices: let M, Y and P be Hermitian matrices such that . Denote the eigenvalues of M, Y and P by
,
, and
, respectively. Then, (Weyl’s inequalities)
.
Use of Mathematica (Wolfram Research I, Citation2016) proved invaluable in simplifying the equations and inequalities presented in this paper. Numerical results were calculated using MATLAB (MathWorks I, Citation2017).
3. Proof of general case
The proof begins with a lemma applying Weyl’s inequalities to bound the eigenvalues of by the eigenvalues of Y given upper and lower bounds for X in the Löwner ordering. The second lemma, a consequence of the first lemma, bounds the eigenvalues of
by the eigenvalues of X .
Lemma 3.1: Consider (positive definite) Hermitian matrices X and Y such that X ≤ αY and X ≥ βY. Then (A) and (B)
.
Proof:
Note that . Let α1 be the maximum eigenvalue of
(which is negative semi-definite as
is positive semidefinite by the definition of X ≤ αY) and βn be the minimum eigenvalue of
(which is positive semidefinite by the definition of X ≥ βY). By hypothesis, α1 ≤ 0 and βn ≥ 0. Note that the eigenvalues for
and
are
,…,
and
,…,
, respectively. By Weyl’s inequalities, we have
. Since α1 ≤0 and βn ≥ 0, we have
and hence
.
Lemma 3.2: Again, consider (positive definite) Hermitian matrices X and Y such that X ≤ αY and X ≥ βY. Then (A) and (B)
.
Proof:
X ≤ αY and X ≥ βY respectively imply and
. Apply Lemma 3.1 to Y (in place of X), X (in place of Y),
(in place of β) and
(in place of α).
Theorem 3.3: Consider (positive definite) Hermitian matrices X and Y such that X ≤ αY and X ≥ βY.
Proof:
Consider two sets of eigenvalues, and
such that
. Since f, a functional form of a Schatten norm, is monotonic in each variable, λi ≤ μi implies
. Given that implication and given that
,
, which is given by part (B) of Lemma 3.1, implies that
. Similarly part (B) of Lemma 3.2 yields
, which implies that
. Combining these two inequalities for
with the definition of
yields
Theorem 3.4: Consider (positive definite) Hermitian matrices X and Y such that Thompson metric is finite. Let
denote a collection of numbers such that
. Then
Proof:
Let . By definition of the Thompson metric, we have (i) X ≤ αY and (ii) Y ≤ αX. Let
. Then we have (iii) βX ≤ Y and (iv) βY ≤ X. Applying Lemma 3.1, part A to (i) and (iii) yields
. Noting that
and
, applying Lemma 3.2, part A to (i) and (iii) yields
. Similarly, recall that
, so reversing the roles of X and Y when applying Lemma 3.1, part A, and respectively Lemma 3.2, part A., to (ii) and (iv) yields
and yields
, respectively. Since
, application of Lemmas 3.1 and 3.2 (part A) to (ii) and (iv) yield
and
. Since (by the definition of the Thompson metric) α ≥ 1 and hence
,
and
.
Substituting . Thus,
. As in the proof of Theorem 3.3, the monotonicity of
and definition of
yield
.
Theorem 3.5: Consider (positive definite) Hermitian matrices X and Y, i.e. such that Thompson metric is finite. Then
.
Proof:
Note that
Since raising positive numbers to powers is monotonically increasing, a consequence of Theorem 3.4 is that
, which in turn is
. Hence, by the definition of and monotonicity of
, we have
. Since
and
,
, which by definition of the Schatten p-norm yields
. Taking the p’th root of both sides of the inequality yields the result.
4. The Frobenius (p = 2) case
We begin by noting that defines an inner product yielding the Frobenius norm, i.e.
. This, together with the commutative property of the trace, leads to the following version of the law of cosines for matrices:
. Since for two (symmetric) positive semidefinite matrices
and
,
(Yang, Citation2000; Yang, Yang, & Teo, Citation2001),
and hence
. Note that the Frobenius norm of a matrix is the same as the Euclidean norm of that matrix reshaped as a vector, so matrices under the Frobenius norm can be treated just as vectors in a Euclidean space.
Let be the Thompson metric between (positive definite) matrices
and
and let
. Note that in Figure , which represents each matrix as a vector, all the vectors shown are coplanar. The angle
is the same as the angle between
and
. Since, by definition of the Thompson metric,
and
are both positive semidefinite,
and
and
whereas
and
. Thus, we have
Figure 1. Relation of the Thompson metric to the Frobenius norm. This figure represents matrices X and Y as vectors that span a plane and illustrates the geometric intuition behind inequalities (1)–(4).
![Figure 1. Relation of the Thompson metric to the Frobenius norm. This figure represents matrices X and Y as vectors that span a plane and illustrates the geometric intuition behind inequalities (1)–(4).](/cms/asset/27c67180-36f4-4848-8683-aa4d81de24db/oama_a_1614318_f0001_oc.jpg)
Adding inequalities (1) and (2) as well as (3) and (4), we have
Thus
Solving for , we have our result
5. A generalization of the Frobenius (p = 2) case
Consider the related and more general problem of bounding the Frobenius norm , with
, given matrix bounds
and
, for scalars
and
. This generalization, illustrated in Figure , yields the following equations:
Figure 2. Difference and Frobenius norm between two vectors give matrix bounds. This figure represents matrices X and Y as vectors that span a plane and illustrates the geometric intuition behind Equations (8) and (9) as well as inequality (10).
![Figure 2. Difference and Frobenius norm between two vectors give matrix bounds. This figure represents matrices X and Y as vectors that span a plane and illustrates the geometric intuition behind Equations (8) and (9) as well as inequality (10).](/cms/asset/cc37c10d-a6ed-45d9-9790-5c6ed6814790/oama_a_1614318_f0002_oc.jpg)
Similar to the argument above (in part 4), and
imply that
, which implies
The above system of two quadratic equations and one quadratic inequality has (assuming is known, even if
is an unknown, approximated by
) three unknowns:
,
and
. Solving this system and simplifying the resulting solutions with Mathematica (Wolfram Research I, Citation2016) yields the following inequalities:
Note that this not only establishes a bound for , given matrix bounds
and
, but this analysis also yields a bound for
. Thus, this analysis provides information about the inner product between
and
, even in cases where
is an unknown, approximated by
. Of course, when
, (11) simplifies to the result established in Section 4.
6. Numerical results
Fifty calculations, performed in MATLAB, with pairs of random positive definite matrices tested the tightness of the bounds presented in this paper. The following formulas generated the ith pair
of matrices
where A, B and C have elements randomly drawn from the uniform distribution on [0,1] and D is a diagonal matrix with diagonal elements randomly drawn from that same distribution. The occurrence of the index i in the formula ensured a range of distances among the 50 matrix pairs tested.
Figure compares values of for (A) p = 1 (trace norm), (B) p = 2 (Frobenius norm) and (C) p = ∞ (spectral norm) with bounds for those values calculated using Theorem 3.5, (and for panel B) Equation (7) and Equation (11). The function, thompson_metric.m, used to calculate the Thompson metric as well as f and g in Equation (11) is available via MATLAB Central File Exchange, and the script, and data calculated using that script, used to generate Figure is available from the author upon request. While the bounds described in this paper are clearly not very tight (for matrices more distant from each other), hopefully, these results will spark further research leading to tighter bounds on Schatten norms based on the Thompson metric.
Figure 3. Comparison, for 50 pairs of matrices, of values of bounds for Schatten metrics derived in this paper vs. the values of the corresponding Schatten metric: (A) bound given by Theorem. 3.5 for the trace norm vs. the trace norm itself, (B) bounds for the Frobenius norm vs. the Frobenius norm itself and (C) bound given by Theorem. 3.5 for the spectral norm vs. the spectral norm itself. In panel (B), bounds given by Theorem. 3.5 are indicated with cyan markers, bounds given by Equation (7) are indicated with magenta markers, and bounds given by Equation (11) are indicated with black markers.
![Figure 3. Comparison, for 50 pairs of matrices, of values of bounds for Schatten metrics derived in this paper vs. the values of the corresponding Schatten metric: (A) bound given by Theorem. 3.5 for the trace norm vs. the trace norm itself, (B) bounds for the Frobenius norm vs. the Frobenius norm itself and (C) bound given by Theorem. 3.5 for the spectral norm vs. the spectral norm itself. In panel (B), bounds given by Theorem. 3.5 are indicated with cyan markers, bounds given by Equation (7) are indicated with magenta markers, and bounds given by Equation (11) are indicated with black markers.](/cms/asset/bbfe33b6-ef2e-4c68-a7e9-f0eba92734f4/oama_a_1614318_f0003_oc.jpg)
7. Discussion
Weyl’s inequalities, and hence some knowledge of the spectra of and
, form the backbone of the proofs presented above. In the motivating case where
is an approximation of an unknown
, the spectrum of
may also be unknown. While the principle result of this paper ultimately only requires knowledge of
(as well as
, which is generally known), purely geometric/trigonometric proofs, such as the one given for the Frobenius case, of the results presented in this paper would be more elegant given the nature of the motivating problem.
Furthermore, proofs not based on the matrix structure of and
but based purely on the ordering (Löwner ordering in this case) and norm (Schatten p-norm) being compared might allow for tighter bounds on
even in the absence of any knowledge of the spectrum of
(or even of
, for that matter), other than perhaps a restriction that
and
be positive semidefinite. In comparison, Theorem 3.4 provides a tighter bound on
than the main result (Theorem 3.5), but it requires some knowledge of the spectrum of
(at least that its eigenvalues are lower in magnitude than the corresponding eigenvalues of
).
Additionally, proofs not based on the matrix structure of and
may lead to the generalization of these results in other orderings, which can also induce Thompson metrics (Cobzaş & Rus, Citation2014), and other norms. For instance, since the Frobenius norm arises from an inner product, a geometrically flavored argument leads to a slightly tighter bound on
than obtained from the general bound for
and setting
. On the other hand, the already established general result for a Thompson metric induced by a normal cone in a Banach space (Lemmens & Nussbaum, Citation2012; Nussbaum, Citation1988) is not as tight as the main result (Theorem 3.5) presented here: as
the value of
such that
approaches unity; thus, the normality constant for the cone of positive semidefinite matrices is unity, and the general result for Banach spaces reduces to
, the right-hand side of which inequality is clearly greater than
since
and
since the (Thompson) metric
is non-negative.
As illustrated in Section 5 of this paper, more general analysis of the Frobenius case yields not only a bound for but also bounds inner product between
and
. In the case where
is an unknown, approximated by
, bounds on the inner product between
and
further quantify how well
approximates
, and may provide further insight into improved approximations of an unknown
.
Hopefully, future research can further generalize the analysis presented in Section 5 to cases where and
, for more general classes of functions on
and
than mere scalar multiplication. Such inequalities, in the Löwner order, arise, for example, in characterizing approximate solutions to the continuous algebraic Riccati equation (Zhang & Liu, Citation2010). Further generalization of the results presented here will facilitate expressing the quality of approximations, found in many areas of matrix algebra and optimization theory, in terms of geometrically intuitive metrics based on Schatten norms rather than less geometrically intuitive bounds in the Löwner order.
Acknowledgements
This work was made possible by the support of William Paterson University of NJ’s Office of the Provost for Assigned Release Time for research. The author also thanks Rajendra Bhatia as well as Yongdo Lim for their advice about where to submit this work and an anonymous reviewer for referring me to Lemmens’ and Nussbaum’s related results in nonlinear Perron-Frobenius theory.
Disclosure Statement
The author has no competing interests to declare.
Additional information
Funding
Notes on contributors
David A. Snyder
David A. Snyder is a professor in the Department of Chemistry at William Paterson University of NJ. He received a Bachelor of Science – double majoring in Mathematics and Biology – from the University of California, Irvine and a Ph.D. in Biochemistry from Rutgers University. Prior to starting his current position, Dr. Snyder was a post-doctoral researcher in Rafael Brüschweiler’s (then at Florida State University) research group. Dr. Snyder’s research interests include quantifying protein flexibility and covariance NMR, which comprises a family of techniques that use algebraic operations to enhance and combine NMR spectra. In the course of researching the accuracy of covariance NMR, Dr. Snyder found that many relevant algebraic results involved Thompson metrics rather than more intuitive metrics based on Schatten norms. The sparsity of mathematical literature relating Thompson metrics and Schatten norms prompted Dr. Snyder to prove the results presented in this paper.
References
- Baksalary, J. K., & Pukelsheim, F. (1991). On the Löwner, minus, and star partial orderings of nonnegative definite matrices and their squares. Linear Algebra and Its Applications, 151, 135–10. doi:10.1016/0024-3795(91)90359-5
- Bhatia, R. (2007). Perturbation bounds for matrix eigenvalues. Philadelphia, PA: SIAM.
- Carli, F. P., & Sepulchre, R. (2015). On the projective geometry of Kalman filter. Proceedings of the 54th Annual IEEE Conference on Decision and Control (CDC) (2420–2425). IEEE Online: IEEE-Xplore 2016 doi:10.1109/CDC.2015.7402570.
- Cobzaş, Ş. & Rus, M. (2014). Normal cones and Thompson metric. In T. Rassias & L. Tóth (Eds.), Topics in mathematical analysis and applications. Springer optimization and its applications (Vol. 94, pp. 209–258). New York, NY: Springer.
- Del Moral, P., Kurtzmann, A., & Tugaut, J. (2017). On the stability and the uniform propagation of chaos of a class of extended ensemble Kalman-Bucy filters. SIAM Journal on Control and Optimization, 55, 119–155. doi:10.1137/16M1087497
- Gaubert, S., & Qu, Z. (2014). The contraction rate in Thompson’s part metric of order-preserving flows on a cone – Application to generalized Riccati equations. Journal of Differential Equations, 256(8), 2902–2948. doi:10.1016/j.jde.2014.01.024
- Lawson, J., & Lim, Y. (2007). A Birkhoff contraction formula with applications to Riccati equations. SIAM Journal on Control and Optimization, 46(3), 930–951. doi:10.1137/050637637
- Lee, H., & Lim, Y. (2008). Invariant metrics, contractions and nonlinear matrix equations. Nonlinearity, 21(4), 857. doi:10.1088/0951-7715/21/4/011
- Lemmens, B., & Nussbaum, R. D. (2012). Nonlinear Perron-Frobenius theory. (Cambridge Tracts in Mathematics; 189). New York (NY): Cambridge University Press.
- Lemmens, B., & Roelands, M. (2015). Unique geodesics for Thompson’s metric [Les géodésiques uniques de la métrique de Thompson]. In: Annales De L’institut Fourier, 65, 315–348.
- Liao, A., Yao, G., & Duan, X. (2010). Thompson metric method for solving a class of nonlinear matrix equation. Applied Mathematics and Computation, 216(6), 1831–1836. doi:10.1016/j.amc.2009.12.022
- MathWorks I. MATLAB. MathWorks; 2017.
- Montrucchio, L. (1998). Thompson metric, contraction property and differentiability of policy functions. Journal of Economic Behavior & Organization, 33(3), 449–466. doi:10.1016/S0167-2681(97)00069-3
- Nussbaum, R. D. (1988). Hilbert’s projective metric and iterated nonlinear maps. (AMS Memoirs; Vol. 75, Number 391). Providence (RI): American Mathematical Society.
- Nussbaum, R. D., & Walsh, C. (2004). A metric inequality for the Thompson and Hilbert geometries. Journal of Inequalities in Pure and Applied Mathematics, 5. http://emis.ams.org/journals/JIPAM/images/131_03_JIPAM/131_03_www.pdf
- Qu, Z. (2014). Contraction of Riccati flows applied to the convergence analysis of a max-plus curse-of-dimensionality–Free method. SIAM Journal on Control and Optimization, 52(5), 2677–2706. doi:10.1137/130906702
- Weyl, H. (1949). Inequalities between the two kinds of eigenvalues of a linear transformation. Proceedings of the National Academy of Sciences of the United States of America, 35(7), 408–411. doi:10.1073/pnas.35.7.408
- Wolfram Research I. (2016). Mathematica. [Internet]. Champaign, Illinois: Wolfram Research, Inc.
- Yang, X. (2000). A matrix trace inequality. Journal of Mathematical Analysis and Applications, 250(1), 372–374. doi:10.1006/jmaa.2000.7068
- Yang, X. M., Yang, X. Q., & Teo, K. L. (2001). A matrix trace inequality. Journal of Mathematical Analysis and Applications, 263(1), 327–331. doi:10.1006/jmaa.2001.7613
- Zhang, J., & Liu, J. (2010). Matrix bounds for the solution of the continuous algebraic Riccati equation. Mathematical Problems in Engineering, 2010, 1–15. doi:10.1155/2010/819064