Full article: Numerical strategies for recursive least squares solutions to the matrix equation AX = B

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The recursive solution to the Procrustes problem -with or without constraints- is thoroughly investigated. Given known matrices $A$ and $B$ , the proposed solution minimizes the square of the Frobenius norm of the difference $A X - B$ when rows or columns are added to $A$ and $B$ . The proposed method is based on efficient strategies which reduce the computational cost by utilizing previous computations when new data are acquired. This is particularly useful in the iterative solution of an unbalanced orthogonal Procrustes problem. The results show that the computational efficiency of the proposed recursive algorithms is more significant when the dimensions of the matrices are large. This demonstrates the usefulness of the proposed algorithms in the presence of high-dimensional data sets. The practicality of the new method is demonstrated through an application in machine learning, namely feature extraction for image processing.

Keywords:

MSC2020 AMS SUBJECT CLASSIFICATIONS:

1. Introduction

In practice, one may be interested in finding the matrix $X$ such that $A X = B$ where matrices $A$ and $B$ come from experiments. However, $A$ and $B$ often do not satisfy the solvability conditions and hence, the least squares solution of the difference $A X - B$ is required [Citation15]. Specifically, the problem of approximating one given matrix $A$ with another given matrix $B$ by a transformation matrix $X$ so that the square of the difference $A X - B$ is minimized is known as the Procrustes problem [Citation13,Citation15,Citation33]. Often, depending on the application, it is assumed that $X$ belongs to a specific class of matrices, and thus setting in this way a set of constraints to the optimization problem. The most frequent classes of matrices for $X$ is orthogonality and symmetry, and variants thereof, see, for example [Citation5,Citation11,Citation15,Citation19,Citation21,Citation29,Citation31,Citation33]. In many cases, orthogonal factorizations like the singular value decomposition (SVD), the eigenvalue decomposition (EVD) and the CS decomposition have been used to solve the Procrustes problem and variants thereof [Citation8,Citation15,Citation19,Citation22,Citation31,Citation33, pp. 327–328].

The application of the Procrustes problem in factor analysis has a long history [Citation10,Citation13,Citation15,Citation28]. It also appears in numerical analysis problems for the solution of partial differential equations, in multidimensional scaling, in growth curve modelling, in scientific computing, in computer vision, in image processing, in system and control theory, in the analysis of space structures and in aerospace engineering for spacecraft attitude determination [Citation12,Citation23,Citation24,Citation26,Citation27,Citation32,Citation38,Citation41,Citation43]. Moreover, in the case where $A$ is the identical matrix the problem becomes a matrix nearness problem with many applications in statistical and financial modelling and in theoretical computer science, see, for example [Citation20,Citation36,Citation39].

The recursive solution to a least squares problem is needed when the experiment is conducted repeatedly and as a result the given matrices are updated with new arriving data regularly. Also, in high dimensional settings, the matrices $A$ and $B$ are very large and it may not be possible to treat all data at once or the computational cost of processing them may be significantly expensive. In this case, a sequential procedure which splits $A$ and $B$ into sub-matrices of smaller dimensions and then proceeds by gradually incorporating the sub-matrices into the least squares solution of $A X - B$ is essential. Recursive least squares is often needed in many problems of different areas like engineering, statistics, econometrics and finance [Citation7,Citation9,Citation17,Citation18,Citation42]. A recursive algorithm reduces the computational cost and also the storage requirements for large matrices.

Herein, the recursive least squares solution to the matrix equation $A X = B$ when $A$ and $B$ are known matrices is investigated in depth. Namely, the recursive solution to the Procrustes problem is examined. The use of the QR decomposition is examined when there are no constraints on $X$ . Also, the problems of minimizing the difference $A X - B$ when $X$ is orthogonal and when $X$ is symmetric are also considered and their recursive least squares solution using the eigenvalue and singular value decompositions is explored in depth. When constraints are imposed on $X$ , the method of Lagrange multipliers is used to solve the optimization problem. The proposed solution, in each case, is the matrix which minimizes the square of the Frobenious norm of $A X - B$ . It is an exact solution in the sense that $X$ is explicitly determined and does not comprise any arbitrary elements. The recursive numerical solution proposed does not require the matrices be full rank.

Throughout this paper, $‖ \cdot ‖_{F}$ denotes the Frobenius norm. Also, for known matrices $S$ and $P$ , when computing partial derivatives [Citation25, p. 201] the following properties are used: (1) $\frac{\partial (S X P)}{\partial X} = S^{T} P^{T}, \frac{\partial (S X^{T} P)}{\partial X} = P S .$ (1) The paper is organized as follows. Section 2 introduces the general Procrustes problem where no assumption is made for the solution matrix $X$ . The problem is solved using the QR decomposition and then the recursive solution is presented. Section 3 considers the orthogonal Procrustes problem where the solution matrix $X$ is orthogonal and Section 4 derives the solution to the symmetric Procrustes problem when $X$ is assumed to be symmetric. Section 5 presents computational results and finally, in Section 6 we conclude and discuss future work.

2. Numerical solution to the general procrustes problem

Consider the problem of finding a matrix $X \in R^{n \times n}$ so that the known matrix $B \in R^{m \times n}$ is approximated by matrix $A X$ where $A \in R^{m \times n}$ is also known. That is, a solution to the matrix equation (2) $A X = B$ (2) is required. The least squares approximation problem to be solved is given by (3) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2},$ (3) where $\begin{aligned} f (X) = {‖ A X - B ‖}_{F}^{2} & = t r a c e ((A X - B)^{T} (A X - B)) \\ = t r a c e (X^{T} A^{T} A X - X^{T} A^{T} B - B^{T} A X + B^{T} B) . \end{aligned}$ On using (Equation1(1) $\frac{\partial (S X P)}{\partial X} = S^{T} P^{T}, \frac{\partial (S X^{T} P)}{\partial X} = P S .$ (1) ) partial differentiation yields (4) $\frac{\partial f (X)}{\partial X} = 2 A^{T} A X - 2 A^{T} B = 0$ (4) whence (5) $A^{T} A X = A^{T} B .$ (5) When $A^{T} A$ is non-singular the solution to (Equation5(5) $A^{T} A X = A^{T} B .$ (5) ) is given by $X = (A^{T} A)^{- 1} A^{T} B$ [Citation14]. However, $A^{T} A$ may be ill-conditioned and in this case inverting the matrix may give inaccurate results. In the case where $A^{T} A$ is singular the latter will fail to give a solution for $X$ . A numerically stable method to obtain $X$ is to use the QR decomposition (QRD) of $A$ , namely (6) $Q_{A}^{T} (\begin{matrix} A & B \end{matrix}) = (\begin{matrix} R_{A} & R_{B 1} \\ 0 & R_{B 2} \end{matrix}), w h e r e Q_{A} = (\begin{matrix} Q_{A 1} & Q_{A 2} \end{matrix}) .$ (6) Then $A = Q_{A 1} R_{A}$ and hence $A^{T} A = R_{A}^{T} R_{A}$ and $A^{T} B = R_{A}^{T} R_{B 1}$ . Therefore, $X = R_{A}^{- 1} R_{B 1}$ . When $A$ is rank deficient, the procedure to obtain $X$ is similar to the above, namely (Equation6(6) $Q_{A}^{T} (\begin{matrix} A & B \end{matrix}) = (\begin{matrix} R_{A} & R_{B 1} \\ 0 & R_{B 2} \end{matrix}), w h e r e Q_{A} = (\begin{matrix} Q_{A 1} & Q_{A 2} \end{matrix}) .$ (6) ). In this case, a complete QRD can be computed to triangularize $A$ [Citation1].

2.1. Recursive solution

We next consider the recursive solution of the Procrustes problem when the matrices $A$ and $B$ are updated. Without loss of generality, suppose that $A$ and $B$ are augmented with the addition of a single row; namely, (7) $\tilde{A} = (\begin{matrix} A \\ a \end{matrix}) \tilde{B} = (\begin{matrix} B \\ b \end{matrix}),$ (7) where $A, B$ are as in (Equation2(2) $A X = B$ (2) ) and $a, b \in R^{1 \times n}$ represent new data points. Then, the updated Procrustes problem, based on $\tilde{A}$ and $\tilde{B}$ , requires the solution of the least squares problem (8) $\underset{\tilde{X}}{a r g m i n} {‖ \tilde{A} \tilde{X} - \tilde{B} ‖}_{F}^{2}$ (8) when (Equation3(3) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2},$ (3) ) has already been solved (see (Equation4(4) $\frac{\partial f (X)}{\partial X} = 2 A^{T} A X - 2 A^{T} B = 0$ (4) )–(Equation6(6) $Q_{A}^{T} (\begin{matrix} A & B \end{matrix}) = (\begin{matrix} R_{A} & R_{B 1} \\ 0 & R_{B 2} \end{matrix}), w h e r e Q_{A} = (\begin{matrix} Q_{A 1} & Q_{A 2} \end{matrix}) .$ (6) )). The efficient solution of (Equation8(8) $\underset{\tilde{X}}{a r g m i n} {‖ \tilde{A} \tilde{X} - \tilde{B} ‖}_{F}^{2}$ (8) ) requires that previous computations from the solution of (Equation3(3) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2},$ (3) ) be utilized. Namely, $\begin{aligned} \underset{\tilde{X}}{a r g m i n} {‖ \tilde{A} \tilde{X} - \tilde{B} ‖}_{F}^{2} & = \underset{\tilde{X}}{a r g m i n} {‖ (\begin{matrix} Q_{A 1}^{T} & 0 \\ 0 & 1_{n} \end{matrix}) [(\begin{matrix} A \\ a \end{matrix}) \tilde{X} - (\begin{matrix} B \\ b \end{matrix})] ‖}_{F}^{2} \\ = \underset{\tilde{X}}{a r g m i n} {‖ (\begin{matrix} R_{A} \\ a \end{matrix}) \tilde{X} - (\begin{matrix} R_{B 1} \\ b \end{matrix}) ‖}_{F}^{2}, \end{aligned}$ where $Q_{A 1}$ , $R_{A}$ and $R_{B 1}$ are as in (Equation6(6) $Q_{A}^{T} (\begin{matrix} A & B \end{matrix}) = (\begin{matrix} R_{A} & R_{B 1} \\ 0 & R_{B 2} \end{matrix}), w h e r e Q_{A} = (\begin{matrix} Q_{A 1} & Q_{A 2} \end{matrix}) .$ (6) ) and $1_{n}$ is the n-dimensional row vector of ones. Consider now the updating QRD (9) $Q_{A_{u}}^{T} (\begin{matrix} R_{A} & R_{B 1} \\ a & b \end{matrix}) = (\begin{matrix} {\tilde{R}}_{A} & {\tilde{R}}_{B_{1}} \\ 0 & {\tilde{R}}_{B_{2}} \end{matrix}) .$ (9) We then have that $\underset{\tilde{X}}{a r g m i n} {‖ \tilde{A} \tilde{X} - \tilde{B} ‖}_{F}^{2} = \underset{\tilde{X}}{a r g m i n} ({‖ {\tilde{R}}_{A} \tilde{X} - {\tilde{R}}_{B_{1}} ‖}_{F}^{2} + {‖ {\tilde{R}}_{B_{2}} ‖}_{F}^{2})$ and therefore it follows that $\tilde{X} = {\tilde{R}}_{A}^{- 1} {\tilde{R}}_{B_{1}}$ .

In many cases, it is possible that the matrices $A$ and $B$ are updated with a single column, namely $\overset{ˇ}{A} = (\begin{matrix} A & A_{n + 1} \end{matrix}), \overset{ˇ}{B} = (\begin{matrix} B & B_{n + 1} \end{matrix})$ where $A_{n + 1}, B_{n + 1} \in R^{m \times 1}$ denote new variables which become available after (Equation3(3) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2},$ (3) ) has been solved. As a result, the solution $\overset{ˇ}{X} \in R^{(n + 1) \times (n + 1)}$ to the updated Procrustes problem (10) $\underset{\overset{ˇ}{X}}{a r g m i n} {‖ \overset{ˇ}{A} \overset{ˇ}{X} - \overset{ˇ}{B} ‖}_{F}^{2} .$ (10) needs to be computed. By utilizing efficiently previous computations from the solution of (Equation3(3) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2},$ (3) ), (Equation10(10) $\underset{\overset{ˇ}{X}}{a r g m i n} {‖ \overset{ˇ}{A} \overset{ˇ}{X} - \overset{ˇ}{B} ‖}_{F}^{2} .$ (10) ) is written as $\begin{aligned} \underset{\overset{ˇ}{X}}{a r g m i n} {‖ \overset{ˇ}{A} \overset{ˇ}{X} - \overset{ˇ}{B} ‖}_{F}^{2} & = \underset{\overset{ˇ}{X}}{a r g m i n} {‖ Q_{A}^{T} ((\begin{matrix} A & A_{n + 1} \end{matrix}) \overset{ˇ}{X} - (\begin{matrix} B & B_{n + 1} \end{matrix})) ‖}_{F}^{2} \\ = \underset{\overset{ˇ}{X}}{a r g m i n} {‖ (\begin{matrix} R_{A} & {\tilde{A}}_{n + 1} \\ 0 & {\hat{A}}_{n + 1} \end{matrix}) \overset{ˇ}{X} - (\begin{matrix} R_{B_{1}} & {\tilde{B}}_{n + 1} \\ R_{B_{2}} & {\hat{B}}_{n + 1} \end{matrix}) ‖}_{F}^{2}, \end{aligned}$ where $Q_{A}$ is from the QRD of $A$ in (Equation6(6) $Q_{A}^{T} (\begin{matrix} A & B \end{matrix}) = (\begin{matrix} R_{A} & R_{B 1} \\ 0 & R_{B 2} \end{matrix}), w h e r e Q_{A} = (\begin{matrix} Q_{A 1} & Q_{A 2} \end{matrix}) .$ (6) ). The column-updating QRD that needs to be computed is then given by $(\begin{matrix} I_{n} & 0 \\ 0 & {\overset{ˇ}{q}}^{T} \end{matrix}) (\begin{matrix} R_{A} & {\tilde{A}}_{n + 1} \\ 0 & {\hat{A}}_{n + 1} \end{matrix}) = (\begin{matrix} R_{A} & {\tilde{A}}_{n + 1} \\ 0 & \overset{ˇ}{a} \\ 0 & 0 \end{matrix}),$ where $\overset{ˇ}{q}$ is an orthogonal transformation that eliminates all but the first element of ${\hat{A}}_{n + 1}$ and $\overset{ˇ}{a}$ is a scalar. Hence, the updated Procrustes problem (Equation10(10) $\underset{\overset{ˇ}{X}}{a r g m i n} {‖ \overset{ˇ}{A} \overset{ˇ}{X} - \overset{ˇ}{B} ‖}_{F}^{2} .$ (10) ) becomes (11) $\underset{\overset{ˇ}{X}}{a r g m i n} {‖ \overset{ˇ}{A} \overset{ˇ}{X} - \overset{ˇ}{B} ‖}_{F}^{2} = \underset{\overset{ˇ}{X}}{a r g m i n} ({‖ {\overset{ˇ}{R}}_{A} \overset{ˇ}{X} - {\overset{ˇ}{R}}_{B} ‖}_{F}^{2} + {‖ {\overset{ˇ}{R}}_{B_{2}} ‖}_{F}^{2}),$ (11) where ${\overset{ˇ}{R}}_{A} = (\begin{matrix} R_{A} & {\tilde{A}}_{n + 1} \\ 0 & \overset{ˇ}{a} \end{matrix}), {\overset{ˇ}{R}}_{B} = (\begin{matrix} R_{B_{1}} & {\tilde{B}}_{n + 1} \\ R_{B_{2}}^{(1)} & {\hat{B}}_{n + 1}^{(1)} \end{matrix}) a n d {\overset{ˇ}{R}}_{B_{2}} = (\begin{matrix} R_{B_{2}}^{(2)} & {\hat{B}}_{n + 1}^{(2)} \end{matrix}) .$ The solution to problem (Equation11(11) $\underset{\overset{ˇ}{X}}{a r g m i n} {‖ \overset{ˇ}{A} \overset{ˇ}{X} - \overset{ˇ}{B} ‖}_{F}^{2} = \underset{\overset{ˇ}{X}}{a r g m i n} ({‖ {\overset{ˇ}{R}}_{A} \overset{ˇ}{X} - {\overset{ˇ}{R}}_{B} ‖}_{F}^{2} + {‖ {\overset{ˇ}{R}}_{B_{2}} ‖}_{F}^{2}),$ (11) ) is given by $\overset{ˇ}{X} = {\overset{ˇ}{R}}_{A}^{- 1} {\overset{ˇ}{R}}_{B}$ .

3. The orthogonal procrustes problem

The orthogonal Procrustes problem (OPP) is that of minimizing the sum of the squared error of the difference matrix $A X - B$ when the unknown matrix $X$ is orthogonal. The constraint is imposed by using the method of Lagrange multipliers; the matrices $A$ and $B$ need not be full rank [Citation33]. The constrained optimization problem is then given by (12) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2} s u b j e c t t o {X X}^{T} = X^{T} X = I,$ (12) where $A, B \in R^{m \times n}$ and $X \in R^{n \times n}$ is orthogonal. Herein, we are most interested in cases where m<n. To find the solution to (Equation12(12) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2} s u b j e c t t o {X X}^{T} = X^{T} X = I,$ (12) ), we consider the Lagrangian function $L (X) = t r a c e ((A X - B)^{T} (A X - B)) + t r a c e (Λ (X X^{T} - I)),$ which is equivalently written as (13) $L (X) = \sum_{j = 1}^{n} \sum_{i = 1}^{m} {(\sum_{q = 1}^{n} a_{i q} x_{q j} - b_{i j})}^{2} + \sum_{q, r = 1}^{n} λ_{q r} (\sum_{k = 1}^{n} x_{q s} x_{r s} - δ_{q r}),$ (13) where $Λ = [λ_{q r}]_{q, r = 1}^{n}$ is the symmetric matrix of Lagrange multipliers [Citation15]. Partial differentiation of (Equation13(13) $L (X) = \sum_{j = 1}^{n} \sum_{i = 1}^{m} {(\sum_{q = 1}^{n} a_{i q} x_{q j} - b_{i j})}^{2} + \sum_{q, r = 1}^{n} λ_{q r} (\sum_{k = 1}^{n} x_{q s} x_{r s} - δ_{q r}),$ (13) ) yields (14) $\frac{\partial L (X)}{\partial x_{p j}} = 2 \sum_{i = 1}^{m} (\sum_{q = 1}^{n} a_{i q} x_{q j} - b_{i j}) a_{i p} + 2 \sum_{r = 1}^{n} λ_{q r} x_{r j} .$ (14) On setting (Equation14(14) $\frac{\partial L (X)}{\partial x_{p j}} = 2 \sum_{i = 1}^{m} (\sum_{q = 1}^{n} a_{i q} x_{q j} - b_{i j}) a_{i p} + 2 \sum_{r = 1}^{n} λ_{q r} x_{r j} .$ (14) ) equal to zero and using matrix notation, (Equation14(14) $\frac{\partial L (X)}{\partial x_{p j}} = 2 \sum_{i = 1}^{m} (\sum_{q = 1}^{n} a_{i q} x_{q j} - b_{i j}) a_{i p} + 2 \sum_{r = 1}^{n} λ_{q r} x_{r j} .$ (14) ) can be written as $A_{q}^{T} A_{q} X_{j} + Λ_{q} X_{j} = A_{q}^{T} B_{j}, q, j = 1, \dots, n,$ or equivalently, as (15) $(A^{T} A + Λ) X = A^{T} B,$ (15) where $A_{q}$ is the qth column of $A$ , $X_{j}$ , $B_{j}$ are the jth columns of $X$ and $B$ , respectively, and $Λ_{q}$ is the qth row of $Λ$ . From (Equation15(15) $(A^{T} A + Λ) X = A^{T} B,$ (15) ), $(A^{T} A + Λ) X X^{T} (A^{T} A + Λ)^{T} = A^{T} B (A^{T} B)^{T}$ , therefore $Λ = (A^{T} B B^{T} A)^{1 / 2} - A^{T} A$ and, as observed earlier, $Λ$ is symmetric. Now on post-multiplying (Equation15(15) $(A^{T} A + Λ) X = A^{T} B,$ (15) ) by $X^{T}$ gives $A^{T} A + Λ = A^{T} B X^{T}$ which implies that $Λ = A^{T} {B X}^{T} - A^{T} A$ , whence $A^{T} {B X}^{T} = X B^{T} A$ . Therefore, (16) $A^{T} B = X B^{T} A X .$ (16) Furthermore, let $H = A^{T} B$ and consider the following two matrices $F = H H^{T} = A^{T} B B^{T} A$ and $G = H^{T} H = B^{T} A A^{T} B .$ Matrices $F, G \in R^{n \times n}$ are symmetric and thus they are diagonalizable and their eigenvalue decomposition (EVD) exists, that is, (17a) $F = U D U^{T}$ (17a) and (17b) $G = V D V^{T},$ (17b) where $U, V \in R^{n \times n}$ are orthogonal. Additionally, since they are of the form $H H^{T}$ and $H^{T} H$ they have the same eigenvalues. Now on using (Equation16(16) $A^{T} B = X B^{T} A X .$ (16) ) it follows that $\begin{aligned} F & = A^{T} B B^{T} A \\ = (X B^{T} A X) {(X B^{T} A X)}^{T} \\ = X B^{T} A A^{T} B X^{T} \\ = X G X^{T}, \end{aligned}$ and from (Equation17a(17a) $F = U D U^{T}$ (17a) ) and (Equation17b(17b) $G = V D V^{T},$ (17b) ) we now have that $F = U D U^{T} = X V D V^{T} X^{T},$ which implies that $U = X V$ , where $U$ is as in (Equation17a(17a) $F = U D U^{T}$ (17a) ). Therefore, the solution to the least squares problem (Equation12(12) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2} s u b j e c t t o {X X}^{T} = X^{T} X = I,$ (12) ) is given by $X = U V^{T} .$ Finally, a sufficient condition which guaranties the uniqueness of the solution $X$ and that the argument in (Equation12(12) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2} s u b j e c t t o {X X}^{T} = X^{T} X = I,$ (12) ) is minimized requires that all the diagonal elements of the matrix $D^{1 / 2}$ in (Equation17a(17a) $F = U D U^{T}$ (17a) ) and (Equation17b(17b) $G = V D V^{T},$ (17b) ) are non-negative [Citation15,Citation33]. In essence, this condition determines the orientation of the orthogonal matrices $U$ and $V$ and is specified by the Eckart–Young decomposition (see [Citation10]) of $A^{T} B$ , namely $A^{T} B = U D^{1 / 2} V^{T} .$ Notice that in the case of symmetric orthogonality the procedure is the same but at the last step the symmetry of $X$ needs to be taken into account. Namely, the sum of $U V^{T}$ is not explicitly computed since only the upper or lower part of $X$ needs to be determined. This results in a dimensional reduction of the solution matrix from $n^{2}$ to $n (n + 1) / 2$ [Citation40].

3.1. Recursive solution to the orthogonal procrustes problem

Suppose that an updated orthogonal Procrustes problem needs to be solved when new data become available. Without loss of generality it is assumed that the original OPP (Equation12(12) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2} s u b j e c t t o {X X}^{T} = X^{T} X = I,$ (12) ) needs to be re-solved when appending a new single row of data in matrices $A$ and $B$ . That is, let the row updated matrices $\tilde{A}$ and $\tilde{B}$ be as in (Equation7(7) $\tilde{A} = (\begin{matrix} A \\ a \end{matrix}) \tilde{B} = (\begin{matrix} B \\ b \end{matrix}),$ (7) ), where $A, B$ are as in (Equation12(12) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2} s u b j e c t t o {X X}^{T} = X^{T} X = I,$ (12) ) with $a, b \in R^{1 \times n}$ . The updated orthogonal Procrustes problem based on $\tilde{A}$ and $\tilde{B}$ requires that the solution of the following least squares problem be derived: (18) $\underset{\tilde{X}}{a r g m i n} {‖ \tilde{A} \tilde{X} - \tilde{B} ‖}_{F}^{2} s u b j e c t t o {\tilde{X} \tilde{X}}^{T} = {\tilde{X}}^{T} \tilde{X} = I,$ (18) where $\tilde{X} \in R^{n \times n}$ . The solution to the least squares problem (Equation18(18) $\underset{\tilde{X}}{a r g m i n} {‖ \tilde{A} \tilde{X} - \tilde{B} ‖}_{F}^{2} s u b j e c t t o {\tilde{X} \tilde{X}}^{T} = {\tilde{X}}^{T} \tilde{X} = I,$ (18) ) is obtained by using the method of Lagrange multipliers, namely (19) $L (\tilde{X}) = t r a c e (\tilde{A} \tilde{X} - \tilde{B})^{T} (\tilde{A} \tilde{X} - \tilde{B})) + t r a c e (\tilde{Λ} (\tilde{X} {\tilde{X}}^{T} - I))$ (19) which yields the first order condition $\frac{\partial L (\tilde{X})}{\partial \tilde{X}} = 2 {\tilde{A}}^{T} \tilde{A} \tilde{X} - 2 {\tilde{A}}^{T} \tilde{B} + ({\tilde{Λ}}^{T} + \tilde{Λ}) \tilde{X}$ on using (Equation1(1) $\frac{\partial (S X P)}{\partial X} = S^{T} P^{T}, \frac{\partial (S X^{T} P)}{\partial X} = P S .$ (1) ). In a way similar to (Equation12(12) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2} s u b j e c t t o {X X}^{T} = X^{T} X = I,$ (12) ), (Equation19(19) $L (\tilde{X}) = t r a c e (\tilde{A} \tilde{X} - \tilde{B})^{T} (\tilde{A} \tilde{X} - \tilde{B})) + t r a c e (\tilde{Λ} (\tilde{X} {\tilde{X}}^{T} - I))$ (19) ) has the solution (20) $\tilde{X} = \tilde{U} {\tilde{V}}^{T},$ (20) where $\tilde{U}$ and $\tilde{V}$ are the orthogonal matrices of the EVD of ${\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A}$ and ${\tilde{B}}^{T} \tilde{A} {\tilde{A}}^{T} \tilde{B}$ , respectively.

The recursive solution of the OPP presumes that previous computations in (Equation17a(17a) $F = U D U^{T}$ (17a) ) from the solution of the original Procrustes problem (Equation12(12) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2} s u b j e c t t o {X X}^{T} = X^{T} X = I,$ (12) ) are efficiently utilized. Consider the recursive computation of the EVD of ${\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A}$ , namely (21) $\begin{aligned} {\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A} & = (A^{T} B + a^{T} b) (A^{T} B + a^{T} b)^{T} \\ = A^{T} B B^{T} A + A^{T} {B b}^{T} a + a^{T} {b B}^{T} A + a^{T} {b b}^{T} a \\ = {U D U}^{T} + A^{T} {B b}^{T} a + a^{T} {b B}^{T} A + a^{T} {b b}^{T} a \end{aligned}$ (21) on using (Equation17a(17a) $F = U D U^{T}$ (17a) ). Therefore, the recursive solution of an orthogonal Procrustes problem becomes a modified symmetric matrix eigenvalue problem [Citation3,Citation4,Citation16]. In particular, (Equation21(21) $\begin{aligned} {\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A} & = (A^{T} B + a^{T} b) (A^{T} B + a^{T} b)^{T} \\ = A^{T} B B^{T} A + A^{T} {B b}^{T} a + a^{T} {b B}^{T} A + a^{T} {b b}^{T} a \\ = {U D U}^{T} + A^{T} {B b}^{T} a + a^{T} {b B}^{T} A + a^{T} {b b}^{T} a \end{aligned}$ (21) ) implies three rank-1 modifications of the EVD ${U D U}^{T}$ of the matrix $A^{T} B B^{T} A$ in (Equation17a(17a) $F = U D U^{T}$ (17a) ). That is, the EVD of ${\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A}$ is obtained recursively in three main steps. First, (Equation21(21) $\begin{aligned} {\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A} & = (A^{T} B + a^{T} b) (A^{T} B + a^{T} b)^{T} \\ = A^{T} B B^{T} A + A^{T} {B b}^{T} a + a^{T} {b B}^{T} A + a^{T} {b b}^{T} a \\ = {U D U}^{T} + A^{T} {B b}^{T} a + a^{T} {b B}^{T} A + a^{T} {b b}^{T} a \end{aligned}$ (21) ) is written as (22) ${\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A} = {U D U}^{T} + {(\begin{matrix} \tilde{b} \\ a \end{matrix})}^{T} (\begin{matrix} 0 & 1 \\ 1 & β \end{matrix}) (\begin{matrix} \tilde{b} \\ a \end{matrix}),$ (22) where $\tilde{b} = b B^{T} A$ and $β = {b b}^{T}$ is a scalar since $b$ and $a$ are row vectors. Second, the following EVD is derived (23) $Δ = (\begin{matrix} 0 & 1 \\ 1 & β \end{matrix}) = Q (\begin{matrix} β_{1} & 0 \\ 0 & β_{2} \end{matrix}) Q^{T}$ (23) and, since $Δ$ is a symmetric matrix, it is diagonalizable. Third, on using (Equation22(22) ${\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A} = {U D U}^{T} + {(\begin{matrix} \tilde{b} \\ a \end{matrix})}^{T} (\begin{matrix} 0 & 1 \\ 1 & β \end{matrix}) (\begin{matrix} \tilde{b} \\ a \end{matrix}),$ (22) ), (Equation23(23) $Δ = (\begin{matrix} 0 & 1 \\ 1 & β \end{matrix}) = Q (\begin{matrix} β_{1} & 0 \\ 0 & β_{2} \end{matrix}) Q^{T}$ (23) ) becomes (24) $\begin{aligned} {\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A} & = {U D U}^{T} + {(\begin{matrix} \tilde{b} \\ a \end{matrix})}^{T} Q (\begin{matrix} β_{1} & 0 \\ 0 & β_{2} \end{matrix}) Q^{T} (\begin{matrix} \tilde{b} \\ a \end{matrix}) \\ = {U D U}^{T} + {(\begin{matrix} {\tilde{b}}_{1} \\ {\tilde{b}}_{2} \end{matrix})}^{T} (\begin{matrix} β_{1} & 0 \\ 0 & β_{2} \end{matrix}) (\begin{matrix} {\tilde{b}}_{1} \\ {\tilde{b}}_{2} \end{matrix}) \\ = {U D U}^{T} + β_{1} {\tilde{b}}_{1}^{T} {\tilde{b}}_{1} + β_{2} {\tilde{b}}_{2}^{T} {\tilde{b}}_{2} \\ = U (D + β_{1} {\tilde{\tilde{b}}}_{1}^{T} {\tilde{\tilde{b}}}_{1}) U^{T} + β_{2} {\tilde{b}}_{2}^{T} {\tilde{b}}_{2}, \end{aligned}$ (24) where ${\tilde{\tilde{b}}}_{1} = {\tilde{b}}_{1} U$ with $β_{1}$ , $β_{2}$ being the eigenvalues of $Δ$ . Consider now the sequential updating of the diagonal matrix $D$ in two steps. The first step computes the EVD of $D + β_{1} {\tilde{\tilde{b}}}_{1}^{T} {\tilde{\tilde{b}}}_{1} = U_{1} D_{1} U_{1}^{T}$ , where $U_{1} \in R^{n \times n}$ is orthogonal and $D_{1} \in R^{n \times n}$ is diagonal. The second step computes the EVD of $D_{1} + β_{2} {\tilde{\tilde{b}}}_{2}^{T} {\tilde{\tilde{b}}}_{2} = U_{2} \tilde{D} U_{2}^{T}$ , where $U_{2} \in R^{n \times n}$ is orthogonal and $\tilde{D} \in R^{n \times n}$ is the diagonal matrix with elements the eigenvalues of ${\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A}$ . That is, ${\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A} = \tilde{U} \tilde{D} {\tilde{U}}^{T}$ , where $\tilde{U} = U U_{1} U_{2}$ and $U$ is as in (Equation17b(17b) $G = V D V^{T},$ (17b) ). Repeating the procedure at steps (Equation21(21) $\begin{aligned} {\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A} & = (A^{T} B + a^{T} b) (A^{T} B + a^{T} b)^{T} \\ = A^{T} B B^{T} A + A^{T} {B b}^{T} a + a^{T} {b B}^{T} A + a^{T} {b b}^{T} a \\ = {U D U}^{T} + A^{T} {B b}^{T} a + a^{T} {b B}^{T} A + a^{T} {b b}^{T} a \end{aligned}$ (21) )–(Equation24(24) $\begin{aligned} {\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A} & = {U D U}^{T} + {(\begin{matrix} \tilde{b} \\ a \end{matrix})}^{T} Q (\begin{matrix} β_{1} & 0 \\ 0 & β_{2} \end{matrix}) Q^{T} (\begin{matrix} \tilde{b} \\ a \end{matrix}) \\ = {U D U}^{T} + {(\begin{matrix} {\tilde{b}}_{1} \\ {\tilde{b}}_{2} \end{matrix})}^{T} (\begin{matrix} β_{1} & 0 \\ 0 & β_{2} \end{matrix}) (\begin{matrix} {\tilde{b}}_{1} \\ {\tilde{b}}_{2} \end{matrix}) \\ = {U D U}^{T} + β_{1} {\tilde{b}}_{1}^{T} {\tilde{b}}_{1} + β_{2} {\tilde{b}}_{2}^{T} {\tilde{b}}_{2} \\ = U (D + β_{1} {\tilde{\tilde{b}}}_{1}^{T} {\tilde{\tilde{b}}}_{1}) U^{T} + β_{2} {\tilde{b}}_{2}^{T} {\tilde{b}}_{2}, \end{aligned}$ (24) ) for ${\tilde{B}}^{T} \tilde{A} {\tilde{A}}^{T} \tilde{B}$ will derive its EVD recursively and will therefore give ${\tilde{B}}^{T} \tilde{A} {\tilde{A}}^{T} \tilde{B} = \tilde{V} \tilde{D} {\tilde{V}}^{T}$ , where $\tilde{V} = V V_{1} V_{2}$ , $V$ is defined in (Equation17a(17a) $F = U D U^{T}$ (17a) ) and $V_{1}, V_{2}$ are computed in a way similar to that for (Equation24(24) $\begin{aligned} {\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A} & = {U D U}^{T} + {(\begin{matrix} \tilde{b} \\ a \end{matrix})}^{T} Q (\begin{matrix} β_{1} & 0 \\ 0 & β_{2} \end{matrix}) Q^{T} (\begin{matrix} \tilde{b} \\ a \end{matrix}) \\ = {U D U}^{T} + {(\begin{matrix} {\tilde{b}}_{1} \\ {\tilde{b}}_{2} \end{matrix})}^{T} (\begin{matrix} β_{1} & 0 \\ 0 & β_{2} \end{matrix}) (\begin{matrix} {\tilde{b}}_{1} \\ {\tilde{b}}_{2} \end{matrix}) \\ = {U D U}^{T} + β_{1} {\tilde{b}}_{1}^{T} {\tilde{b}}_{1} + β_{2} {\tilde{b}}_{2}^{T} {\tilde{b}}_{2} \\ = U (D + β_{1} {\tilde{\tilde{b}}}_{1}^{T} {\tilde{\tilde{b}}}_{1}) U^{T} + β_{2} {\tilde{b}}_{2}^{T} {\tilde{b}}_{2}, \end{aligned}$ (24) ). Therefore, the updated solution (Equation20(20) $\tilde{X} = \tilde{U} {\tilde{V}}^{T},$ (20) ) to the Procrustes problem (Equation18(18) $\underset{\tilde{X}}{a r g m i n} {‖ \tilde{A} \tilde{X} - \tilde{B} ‖}_{F}^{2} s u b j e c t t o {\tilde{X} \tilde{X}}^{T} = {\tilde{X}}^{T} \tilde{X} = I,$ (18) ) has been derived.

When extra columns are added to $A$ and $B$ , that is when (25) $\overset{˘}{A} = \begin{matrix} n & k \\ A & A_{n + 1} \end{matrix} a n d \overset{˘}{B} = \begin{matrix} n & k \\ B & B_{n + 1} \end{matrix},$ (25) an updated orthogonal Procrustes problem of larger dimensions needs to be solved. Namely, (26) $\underset{\overset{˘}{X}}{a r g m i n} {‖ \overset{˘}{A} \overset{˘}{X} - \overset{˘}{B} ‖}_{F}^{2} s u b j e c t t o {\overset{˘}{X} \overset{˘}{X}}^{T} = {\overset{˘}{X}}^{T} \overset{˘}{X} = I,$ (26) where $\overset{˘}{X} \in R^{(n + k) \times (n + k)}$ has been augmented by k columns and k rows. In a similar way as for (Equation18(18) $\underset{\tilde{X}}{a r g m i n} {‖ \tilde{A} \tilde{X} - \tilde{B} ‖}_{F}^{2} s u b j e c t t o {\tilde{X} \tilde{X}}^{T} = {\tilde{X}}^{T} \tilde{X} = I,$ (18) ), the solution of (Equation26(26) $\underset{\overset{˘}{X}}{a r g m i n} {‖ \overset{˘}{A} \overset{˘}{X} - \overset{˘}{B} ‖}_{F}^{2} s u b j e c t t o {\overset{˘}{X} \overset{˘}{X}}^{T} = {\overset{˘}{X}}^{T} \overset{˘}{X} = I,$ (26) ) is given by $\overset{˘}{X} = \overset{˘}{U} {\overset{˘}{V}}^{T}$ . It is obtained recursively, by updating the original orthogonal decompositions as in (Equation24(24) $\begin{aligned} {\tilde{A}}^{T} \tilde{B} {\tilde{B}}^{T} \tilde{A} & = {U D U}^{T} + {(\begin{matrix} \tilde{b} \\ a \end{matrix})}^{T} Q (\begin{matrix} β_{1} & 0 \\ 0 & β_{2} \end{matrix}) Q^{T} (\begin{matrix} \tilde{b} \\ a \end{matrix}) \\ = {U D U}^{T} + {(\begin{matrix} {\tilde{b}}_{1} \\ {\tilde{b}}_{2} \end{matrix})}^{T} (\begin{matrix} β_{1} & 0 \\ 0 & β_{2} \end{matrix}) (\begin{matrix} {\tilde{b}}_{1} \\ {\tilde{b}}_{2} \end{matrix}) \\ = {U D U}^{T} + β_{1} {\tilde{b}}_{1}^{T} {\tilde{b}}_{1} + β_{2} {\tilde{b}}_{2}^{T} {\tilde{b}}_{2} \\ = U (D + β_{1} {\tilde{\tilde{b}}}_{1}^{T} {\tilde{\tilde{b}}}_{1}) U^{T} + β_{2} {\tilde{b}}_{2}^{T} {\tilde{b}}_{2}, \end{aligned}$ (24) ). Notice that the addition of extra columns implies a rank-k updating of the EVD decomposition.

4. The symmetric procrustes problem

Consider the problem of minimizing the sum of squared error of the difference matrix ${A X}_{S} - B$ when the unknown matrix $X_{S}$ is symmetric. The constrained optimization problem is given by $\underset{X_{S}}{a r g m i n} {‖ A X_{S} - B ‖}_{F}^{2} s u b j e c t t o X_{S}^{T} = X_{S},$ where $X_{S} \in R^{n \times n}$ is a symmetric matrix. As in the case of the OPP (Equation12(12) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2} s u b j e c t t o {X X}^{T} = X^{T} X = I,$ (12) ), the matrices $A$ and $B$ are not necessarily full rank. Using the method of Lagrange multipliers, the problem becomes that of finding the matrix $X_{S}$ which minimizes $L (X_{S}) = t r a c e ((A X_{S} - B)^{T} (A X_{S} - B)) + t r a c e (Λ (X_{S} - X_{S}^{T})) .$ On using (Equation1(1) $\frac{\partial (S X P)}{\partial X} = S^{T} P^{T}, \frac{\partial (S X^{T} P)}{\partial X} = P S .$ (1) ) partial differentiation yields $\frac{\partial L (X_{S})}{\partial X_{S}} = 2 A^{T} A X_{S} - 2 A^{T} B + Λ^{T} - Λ,$ which is set to zero. Now let the matrix $M = Λ - Λ^{T}$ , which is skew-symmetric, that is $M = - M^{T}$ . It follows that $2 A^{T} A X_{S} - 2 A^{T} B = - (2 X_{S} A^{T} A - 2 B^{T} A),$ which yields the Lyapunov equation (27) $A^{T} A X_{S} + X_{S} A^{T} A = A^{T} B + B^{T} A .$ (27) Since $A^{T} A$ is symmetric, it is therefore a diagonalizable matrix. That is, there is an orthogonal matrix $P$ and a diagonal matrix $D_{A}$ such that (28) $A^{T} A = P D_{A} P^{T},$ (28) where $P \in R^{n \times n}$ has columns the eigenvectors of $A^{T} A$ and $D_{A} = d i a g (μ_{1}, \dots, μ_{n})$ has diagonal elements the eigenvalues of $A^{T} A$ . Using (Equation27(27) $A^{T} A X_{S} + X_{S} A^{T} A = A^{T} B + B^{T} A .$ (27) ), (Equation28(28) $A^{T} A = P D_{A} P^{T},$ (28) ) becomes (29) $D_{A} X_{S}^{(P)} + X_{S}^{(P)} D_{A} = S,$ (29) where $X_{S}^{(P)} = P^{T} X_{S} P$ and $S = P^{T} (A^{T} B + B^{T} A) P$ . By utilizing the diagonal structure of $D_{A}$ , it follows that $x_{i, j} = \frac{s_{i j}}{μ_{i} + μ_{j}},$ where $X_{S}^{(P)} = [x_{i, j}]_{i, j = 1}^{n}$ . The solution is then given by $X_{S} = P X_{S}^{(P)} P^{T} .$ A necessary and sufficient condition for the uniqueness of $X_{S}$ , since $S$ is positive definite, is that all the eigenvalues of $A^{T} A$ have a negative real part, that is, $A^{T} A$ is a stable matrix [Citation2,Citation34,Citation35].

4.1. Recursive solution to the symmetric procrustes problem

Consider now the case where the matrices $A$ and $B$ are augmented by the addition of an extra row as in (Equation7(7) $\tilde{A} = (\begin{matrix} A \\ a \end{matrix}) \tilde{B} = (\begin{matrix} B \\ b \end{matrix}),$ (7) ). The updated symmetric Procrustes problem requires the solution of the optimization problem $\underset{{\tilde{X}}_{S}}{a r g m i n} {‖ {\tilde{A} \tilde{X}}_{S} - \tilde{B} ‖}_{F}^{2} s u b j e c t t o {\tilde{X}}_{S} = {\tilde{X}}_{S}^{T},$ where $\tilde{A}$ and $\tilde{B}$ are defined as in (Equation7(7) $\tilde{A} = (\begin{matrix} A \\ a \end{matrix}) \tilde{B} = (\begin{matrix} B \\ b \end{matrix}),$ (7) ). Using the Lagrange multipliers and the analysis for the symmetric Procrustes problem as in the previous section, the solution is obtained from the Lyapunov equation (30) ${\tilde{A}}^{T} \tilde{A} {\tilde{X}}_{S} + {\tilde{X}}_{S} {\tilde{A}}^{T} \tilde{A} = {\tilde{A}}^{T} \tilde{B} + {\tilde{B}}^{T} \tilde{A}$ (30) by computing the EVD of ${\tilde{A}}^{T} \tilde{A}$ . To recursively solve (Equation30(30) ${\tilde{A}}^{T} \tilde{A} {\tilde{X}}_{S} + {\tilde{X}}_{S} {\tilde{A}}^{T} \tilde{A} = {\tilde{A}}^{T} \tilde{B} + {\tilde{B}}^{T} \tilde{A}$ (30) ) observe that on using (Equation28(28) $A^{T} A = P D_{A} P^{T},$ (28) ) yields $\begin{aligned} {\tilde{A}}^{T} \tilde{A} & = A^{T} A + a^{T} a \\ = P D_{A} P^{T} + a^{T} a \\ = P (D_{A} + {\tilde{a}}^{T} \tilde{a}) P^{T} . \end{aligned}$ Therefore, the sequential updating of $D_{A}$ requires one rank-1 update, namely, $D_{A} + {\tilde{a}}^{T} \tilde{a} = P_{1} {\tilde{D}}_{A} P_{1}^{T}$ , whence ${\tilde{A}}^{T} \tilde{A} = \tilde{P} {\tilde{D}}_{A} {\tilde{P}}^{T}$ with $\tilde{P} = P P_{1}$ . The Lyapunov equation (Equation30(30) ${\tilde{A}}^{T} \tilde{A} {\tilde{X}}_{S} + {\tilde{X}}_{S} {\tilde{A}}^{T} \tilde{A} = {\tilde{A}}^{T} \tilde{B} + {\tilde{B}}^{T} \tilde{A}$ (30) ) then becomes (31) ${\tilde{D}}_{A} {\tilde{X}}_{S}^{P} + {\tilde{X}}_{S}^{P} {\tilde{D}}_{A} = \tilde{S},$ (31) where ${\tilde{X}}_{S}^{(P)} = {\tilde{P}}^{T} {\overset{ˇ}{X}}_{S} \tilde{P}$ and $\tilde{S} = {\tilde{P}}^{T} ({\tilde{A}}^{T} \tilde{B} + {\tilde{B}}^{T} \tilde{A}) \tilde{P}$ . The solution to (Equation31(31) ${\tilde{D}}_{A} {\tilde{X}}_{S}^{P} + {\tilde{X}}_{S}^{P} {\tilde{D}}_{A} = \tilde{S},$ (31) ) is obtained in a way similar to that for (Equation29(29) $D_{A} X_{S}^{(P)} + X_{S}^{(P)} D_{A} = S,$ (29) ).

5. Computational remarks

To amplify the practical usability of the proposed method, the theoretical complexity analysis of the new algorithms has been studied. Consider the general Procrustes problem when no constraint is imposed to the least squares solution matrix $X$ . Let the matrices $\tilde{A}, \tilde{B} \in R^{(m + 1) \times n}$ as in (Equation7(7) $\tilde{A} = (\begin{matrix} A \\ a \end{matrix}) \tilde{B} = (\begin{matrix} B \\ b \end{matrix}),$ (7) ) and assume that the QRD of $A$ in (Equation6(6) $Q_{A}^{T} (\begin{matrix} A & B \end{matrix}) = (\begin{matrix} R_{A} & R_{B 1} \\ 0 & R_{B 2} \end{matrix}), w h e r e Q_{A} = (\begin{matrix} Q_{A 1} & Q_{A 2} \end{matrix}) .$ (6) ) is available. The complexity of computing the QRD of $\tilde{A}$ afresh is $2 n^{2} (m + 1 - n / 3)$ floating point operations (flops) and that of applying these orthogonal transformations to matrix $\tilde{B}$ is $2 n^{2} (2 m - n + 3)$ flops. The complexities of computing the updating QRD in (Equation9(9) $Q_{A_{u}}^{T} (\begin{matrix} R_{A} & R_{B 1} \\ a & b \end{matrix}) = (\begin{matrix} {\tilde{R}}_{A} & {\tilde{R}}_{B_{1}} \\ 0 & {\tilde{R}}_{B_{2}} \end{matrix}) .$ (9) ) in order to update $R_{A}$ and $R_{B}$ require $4 n^{2}$ and $8 n^{2}$ flops, respectively. As a result, to compute the QRD of $\tilde{A}$ and to apply the orthogonal transformations to $\tilde{B}$ will always be computationally more demanding than computing it recursively utilizing previous calculations. The computational efficiency of the recursive method compared to computing the QRD from scratch is, approximately, given by $(9 m + 2 n) / 18$ .

The solution of the orthogonal Procrustes problem is derived from the EVD of the square matrices $F = A^{T} B (A^{T} B)^{T}$ and $G = B^{T} A A^{T} B$ . However, in practice the SVD of $A^{T} B = U Σ V^{T}$ is computed since $F = U Σ^{2} U^{T} = U D U^{T}$ and $G = V Σ^{2} V^{T} = V D V^{T}$ . When $A$ and $B$ are modified, the computationally efficient solution to the orthogonal Procrustes problem requires that the SVD of $A^{T} B$ is computed recursively. Therefore, in the recursive solution of the orthogonal and symmetric Procrustes problems, updating SVD decompositions are computed which provide equivalent results as if the EVD were computed. The SVD of an $m \times n$ matrix requires $n^{3} + m n^{2} + O (m n)$ flops in the first stage and $k m n^{2} + k n^{2} + O (m n)$ flops in the second stage where k is the number of iterations required to compute each singular value. Herein, it is assumed that $m ≪ n$ and it holds that $r a n k (A^{T} B) = r < m i n (m, n)$ .

The algorithms employ a low-rank modification strategy for the recursive updating of the SVD as in [Citation3]. Algorithms 5.1–5.3 summarize the main computational steps for the solution of the orthogonal Procrustes problem and the recursive proposed solution when new rows or columns of data are added, respectively.

To evaluate further the new algorithms, experiments based on synthetic and real data have been conducted. The computational times of the new algorithms have been compared with the algorithm that solves the same problem afresh in order to obtain the efficiency ratio.

Table presents the execution times (in CPU seconds) of Algorithm 5.1 compared with Algorithm 5.3 when m = 50, 100, 250, 500 rows and n = 1000, 5000 and a single column of data is added to $A$ and $B$ , that is k = 1. The results show that keeping n fixed and increasing m reduces the computational efficiency. However, comparing Panel A and B shows that the efficiency is more significant when n increases. Table presents the execution times (in CPU seconds) of the Algorithms 5.1 and 5.2 when m = 100, n = 1000 and when m = 500, n = 10, 000 with a variable number k of new columns are added to $A$ and $B$ . All the times presented are the average times after solving the same OPP (afresh or recursively) 100 times. The computational results in Table show that as the number of dimensions increase the computational efficiency also increases. Furthermore, the results suggest that the proposed algorithm for the addition on new column is more efficient when a small number of extra columns is included. Comparing the results in Panel A and B of Table we see that the efficiency of the proposed recursive method increases when the dimensions of the matrices in the original OPP also increase.

Table 1. Execution times in seconds of the recursive solution of the OPP when one single column is added.

Display Table

Table 2. Execution times in seconds of the recursive solution of the OPP when new columns are added.

Display Table

Many a time, the solution of an OPP is applied in the development of algorithms for face recognition, see for example [Citation6,Citation30,Citation37,Citation43]. The algorithm proposed in [Citation43] for feature extraction requires to solve -until convergence- a series of updating OPPs as in (Equation12(12) $\underset{X}{a r g m i n} {‖ A X - B ‖}_{F}^{2} s u b j e c t t o {X X}^{T} = X^{T} X = I,$ (12) ) where $A$ is the data matrix and $B$ is the class indicator matrix after they have both been centred. The processing of an image starts by considering their pixels as the elements of a matrix and then converting each matrix (or each image) to a column vector. Each row in $A$ has length equal to the feature number of the images. When a new image becomes available and is added to the database, a row will be added to $A$ . The face recognition algorithm will then have to process the extra image, that is the additional row of the data matrix $A$ . This is equivalent to row updating as in (Equation18(18) $\underset{\tilde{X}}{a r g m i n} {‖ \tilde{A} \tilde{X} - \tilde{B} ‖}_{F}^{2} s u b j e c t t o {\tilde{X} \tilde{X}}^{T} = {\tilde{X}}^{T} \tilde{X} = I,$ (18) ). Herein, Algorithm 5.3 is employed to show the computational efficiency of utilizing previous computations when a series of updating OPPs is solved after augmenting the data matrix.

In Table the algorithm that estimates the problems afresh and the recursive algorithm are compared. It is assumed that there are m = 2000, 5000, 10, 000, 15, 000, 20, 000 images available which have been cropped and re-sized to $16 \times 16$ and $32 \times 32$ pixels, that is n = 256, 1024. The run times (in CPU seconds) when an extra image or alternatively an extra row of data is included 100 times are presented. The results show that when new rows of data are added to the problem the efficiency increases as the number of rows increase. The reason for this is the fact that the recursive algorithm does not depend on m, the number of rows of matrices $A$ and $B$ . Instead, it depends on the number of columns n of $A$ and $B$ and on the number of rows l added to the model. This is also why the timings of Algorithm 5.3 are the same when n and l are fixed.

Table 3. Total execution times in seconds of the recursive solution of the OPP when a new single row of data is added 100 times.

Display Table

Overall, the results show that the computational efficiency of the proposed recursive algorithm increases when the dimensions of the matrices (data) increase. The computational efficiency is significantly more important when a small number of rows or columns is amended -compared to the original matrices- that is, the modification is low-rank. This demonstrates the practicability of the proposed method when solving sequentially OPPs with small modifications in the underlying dataset. The proposed methods are particularly usable when the matrices involved are large-scale and the data are high-dimensional. All the reported computational results were performed on a 64-bit 1.80 GHz Core(TM) i7-8550U processor and 16.0 GB RAM using R (version 3.6.1).

6. Conclusions and future work

The recursive least squares solutions to the matrix equation AX = B when new rows or columns of data are added to data matrices A and B is thoroughly investigated. A computationally efficient algorithm which is based on the singular value decomposition is proposed. Computational results are presented for synthetic data and also for a machine learning application based on feature extraction for face recognition. The recursive solution to the symmetric Procrustes problem when including extra data is also investigated. The experimental results suggest that the proposed algorithms are computationally more efficient when the matrices are high-dimensional and when they are augmented with a small number of rows or columns.

The extension of the proposed method to include other special classes of matrices like reflexive and anti-reflexive, Stiefel matrices or Toeplitz matrices merits further investigation. Future work will also consider the solution to the Procrustes problem with regularization constraints after modifying the matrices with the addition or deletion of data.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

E. Anderson, Z. Bai, and J. Dongarra, Generalized QR factorization and its applications, Linear Algebra Appl. 162–164(0) (1992), pp. 243–271.
Web of Science ®Google Scholar
R.H. Bartels and G.W. Stewart, Solution of the matrix equation AX + XB = C [F4], Commun. ACM 15(9) (1972), pp. 820–826.
Web of Science ®Google Scholar
M. Brand, Fast low-rank modifications of the thin singular value decomposition, Linear Algebra Appl. 415(1) (2006), pp. 20–30.
Web of Science ®Google Scholar
J.R. Bunch and C.P. Nielsen, Updating the singular value decomposition, Numer. Math. 31(2) (1978), pp. 111–129.
Web of Science ®Google Scholar
J.R. Cardoso and K. Ziẹtak, On a sub-Stiefel procrustes problem arising in computer vision, Numer. Linear Algebra Appl. 22(3) (2015), pp. 523–547.
Web of Science ®Google Scholar
D. Chen, X. Cao, F. Wen, and J. Sun. Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification, in 2013 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 2013, pp. 3025–3032.
Google Scholar
R. Christensen, L.M. Pearson, and W. Johnson, Case-deletion diagnostics for mixed models, Technometrics 34(1) (1992), pp. 38–45.
Web of Science ®Google Scholar
K. Chu, Singular value and generalized singular value decompositions and the solution of linear matrix equations, Linear Algebra Appl. 88 (1987), pp. 83–98.
Web of Science ®Google Scholar
R.D. Cook, Influential observations in linear regression, J. Am. Stat. Assoc. 74(365) (1979), pp. 169–174.
Web of Science ®Google Scholar
C. Eckart and G. Young, The approximation of one matrix by another of lower rank, Psychometrika 1(3) (1936), pp. 211–218.
Google Scholar
L. Eldén and H. Park, A procrustes problem on the Stiefel manifold, Numer. Math. 82(4) (1999), pp. 599–619.
Web of Science ®Google Scholar
Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin, Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval, IEEE Trans. Pattern Anal. Mac. Intell. 35(12) (2013 Dec), pp. 2916–2929.
PubMed Web of Science ®Google Scholar
J.C. Gower, Generalized procrustes analysis, Psychometrika 40(1) (1975), pp. 33–51.
Web of Science ®Google Scholar
J.C. Gower, Procrustes methods, Interdiscip. Rev. Comput. Stat. 2(4) (2010), pp. 503–508.
Google Scholar
B.F. Green, The orthogonal approximation of an oblique structure in factor analysis, Psychometrika 17(4) (1952), pp. 429–440.
Google Scholar
M. Gu and S.C. Eisenstat, Downdating the singular value decomposition, SIAM J. Matrix Anal. Appl. 16(3) (1995), pp. 793–810.
Web of Science ®Google Scholar
S. Hadjiantoni and E.J. Kontoghiorghes, Estimating large-scale general linear and seemingly unrelated regressions models after deleting observations, Stat. Comput. 27(2) (2017), pp. 349–361.
Web of Science ®Google Scholar
S. Hadjiantoni and E.J. Kontoghiorghes, A recursive three-stage least squares method for large-scale systems of simultaneous equations, Linear Algebra Appl. 536 (2018), pp. 210–227.
Web of Science ®Google Scholar
N.J. Higham, The symmetric procrustes problem, BIT Numer. Math. 28(1) (1988), pp. 133–143.
Google Scholar
N.J. Higham and N. Strabić, Bounds for the distance to the nearest correlation matrix, SIAM J. Matrix Anal. Appl. 37(3) (2016), pp. 1088–1102.
Web of Science ®Google Scholar
D. Hua, On the symmetric solutions of linear matrix equations, Linear Algebra Appl. 131 (1990), pp. 1–7.
Web of Science ®Google Scholar
A.-P. Liao and Y. Lei, Least-squares solution with the minimum-norm for the matrix equation (AXB, GXH) = (C, D), Comput. Math. Appl. 50(3) (2005), pp. 539–549.
Web of Science ®Google Scholar
X. Liu, Hermitian and non-negative definite reflexive and anti-reflexive solutions to AX=B, Int. J. Comput. Math. 95(8) (2018), pp. 1666–1671.
Web of Science ®Google Scholar
Y.H. Liu, Ranks of least squares solutions of the matrix equation AXB=C, Comput. Math. Appl. 55(6) (2008), pp. 1270–1278.
Web of Science ®Google Scholar
J.R. Magnus and H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, 3rd ed., Wiley, Chichester, UK, 2007.
Google Scholar
F.L. Markley, Attitude determination using vector observations and the singular value decomposition, J. Astronaut. Sci. 36(3) (1988), pp. 245–258.
Web of Science ®Google Scholar
C. Meng, X. Hu, and L. Zhang, The skew-symmetric orthogonal solutions of the matrix equation AX=B, Linear Algebra Appl. 402 (2005), pp. 303–318.
Web of Science ®Google Scholar
C.I. Mosier, Determining a simple structure when loadings for certain tests are known, Psychometrika 4(2) (1939), pp. 149–162.
Google Scholar
X.Y. Peng, X.Y. Hu, and L. Zhang, The reflexive and anti-reflexive solutions of the matrix equation AHXB=C,J. Comput. Appl. Math. 200(2) (2007), pp. 749–760.
Web of Science ®Google Scholar
Y. Peng, W. Kong, and B. Yang, Orthogonal extreme learning machine for image classification, Neurocomputing 266 (2017), pp. 458–464.
Web of Science ®Google Scholar
Y. Qiu and A. Wang, Solving balanced procrustes problem with some constraints by eigenvalue decomposition,J. Comput. Appl. Math. 233(11) (2010), pp. 2916–2924.
Web of Science ®Google Scholar
M.B. Rubin and D. Solav, Unphysical properties of the rotation tensor estimated by least squares optimization with specific application to biomechanics, Int. J. Eng. Sci. 103 (2016), pp. 11–18.
Web of Science ®Google Scholar
P.H. Schönemann, A generalized solution of the orthogonal procrustes problem, Psychometrika 31(1) (1966),pp. 1–10.
Web of Science ®Google Scholar
V. Simoncini, Computational methods for linear matrix equations, SIAM Rev. 58(3) (2016), pp. 377–441.
Web of Science ®Google Scholar
J. Snyders and M. Zakai, On nonnegative solutions of the equation AD+DA′=−C, SIAM J. Appl. Math. 18(3) (1970), pp. 704–714.
Web of Science ®Google Scholar
Y. Sun and L. Vandenberghe, Decomposition methods for sparse matrix nearness problems, SIAM J. Matrix Anal. Appl. 36(4) (2015), pp. 1691–1717.
Web of Science ®Google Scholar
Y. Tai, J. Yang, Y. Zhang, L. Luo, J. Qian, and Y. Chen, Face recognition with pose variations and misalignment via orthogonal procrustes regression, IEEE Trans. Image Process. 25(6) (2016), pp. 2673–2683.
PubMed Web of Science ®Google Scholar
Y. Tian and Y. Takane, On consistency, natural restrictions and estimability under classical and extended growth curve models, J. Stat. Plan. Inference 139(7) (2009), pp. 2445–2458.
Web of Science ®Google Scholar
J.A. Tropp, A. Yurtsever, M. Udell, and V. Cevher, Practical sketching algorithms for low-rank matrix approximation, SIAM J. Matrix Anal. Appl. 38(4) (2017), pp. 1454–1485.
Web of Science ®Google Scholar
W.J. Vetter, Vector structures and solutions of linear matrix equations, Linear Algebra Appl. 10(2) (1975),pp. 181–188.
Web of Science ®Google Scholar
G. Wahba, A least squares estimate of satellite attitude, SIAM Rev. 7(3) (1965), pp. 409–409.
Google Scholar
P.C. Young, Recursive Estimation and Time-Series Analysis, 2nd ed., Springer-Verlag, Berlin Heidelberg, 2011.
Google Scholar
H. Zhao, Z. Wang, and F. Nie, Orthogonal least squares regression for feature extraction, Neurocomputing 216 (2016), pp. 200–207.
Web of Science ®Google Scholar

Numerical strategies for recursive least squares solutions to the matrix equation AX = B

Abstract

1. Introduction

2. Numerical solution to the general procrustes problem

2.1. Recursive solution

3. The orthogonal procrustes problem

3.1. Recursive solution to the orthogonal procrustes problem

4. The symmetric procrustes problem

4.1. Recursive solution to the symmetric procrustes problem

5. Computational remarks

Table 1. Execution times in seconds of the recursive solution of the OPP when one single column is added.

Table 2. Execution times in seconds of the recursive solution of the OPP when new columns are added.

Table 3. Total execution times in seconds of the recursive solution of the OPP when a new single row of data is added 100 times.

6. Conclusions and future work

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

Numerical strategies for recursive least squares solutions to the matrix equation AX = B

Abstract

1. Introduction

2. Numerical solution to the general procrustes problem

2.1. Recursive solution

3. The orthogonal procrustes problem

3.1. Recursive solution to the orthogonal procrustes problem

4. The symmetric procrustes problem

4.1. Recursive solution to the symmetric procrustes problem

5. Computational remarks

Table 1. Execution times in seconds of the recursive solution of the OPP when one single column is added.

Table 2. Execution times in seconds of the recursive solution of the OPP when new columns are added.

Table 3. Total execution times in seconds of the recursive solution of the OPP when a new single row of data is added 100 times.

6. Conclusions and future work

Disclosure statement

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date