Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

A wide class of coupled-cluster methods is introduced, based on Arponen's extended coupled-cluster theory. This class of methods is formulated in terms of a coordinate transformation of the cluster operators. The mathematical framework for the error analysis of coupled-cluster methods based on Arponen's bivariational principle is presented, in which the concept of local strong monotonicity of the flipped gradient of the energy is central. A general mathematical result is presented, describing sufficient conditions for coordinate transformations to preserve the local strong monotonicity. The result is applied to the presented class of methods, which include the standard and quadratic coupled-cluster methods, and also Arponen's canonical version of extended coupled-cluster theory. Some numerical experiments are presented, and the use of canonical coordinates for diagnostics is discussed.

GRAPHICAL ABSTRACT

Keywords:

1. Introduction

It is with delight that the authors dedicate this work to Professor Jürgen Gauß on the occasion of his sixtieth birthday. In the spirit of his pursuit of scientific rigour, especially the attention to detail in coupled-cluster (CC) theory, we here present a mathematical study of some alternative formulations based on Arponen's extended CC (ECC) method [Citation1,Citation2]. The ECC method is defined in terms of the critical points of an energy functional, (1) $E_{ECC} (T, Λ) = ⟨ ϕ_{0}, e^{Λ^{†}} e^{- T} H e^{T} ϕ_{0} ⟩,$ (1) where T and Λ are cluster operators in the usual sense of CC theory. The well-known CC Lagrangian introduced by Helgaker and Jørgensen [Citation3] is obtained by expanding $e^{Λ^{†}}$ to first order. In this sense, the standard CC approach is an approximation to the ECC method.

We will study a collection of methods that generalises this idea, defined by substitution of $e^{Λ^{†}}$ by a Taylor polynomial of fixed degree n. This can be formulated as a coordinate transformation of the cluster amplitudes. A second class of models is obtained by a further coordinate transformation introduced by Arponen to ensure a certain linkedness structure of the energy functional. We refer to these coordinates as canonical, as the time-dependent Schrödinger equation takes the form of Hamilton's equations of motion in this case. Thus, we obtain two hierarchies NC-ECC $(n)$ , using non-canonical coordinates, and C-ECC $(n)$ using canonical coordinates. Our mathematical results imply that when the cluster operators are not truncated, all these models are exact and equivalent to the time-independent Schrödinger equation. Moreover, Galerkin approximations (i.e. generic truncation schemes that can approach the untruncated limit) will converge under certain relatively mild single-reference-type conditions. While the methods discussed here are all expensive (for n>1), we consider them as stepping stones towards producing competitive alternatives to standard CC theory that alleviate deficiencies of the latter, such as the inability to correctly break chemical bonds.

The various forms of CC methods are today among the most widely used for wavefunction-based calculations on manybody systems. The main idea stems from Hubbard's exponential parameterisation of the wavefunction based on cluster operators in manybody perturbation theory [Citation4], which was taken as starting point for ab initio treatments by Coester and Kümmel for nuclear structure calculations in the 1950s [Citation5,Citation6]. The modern form of standard CC theory was developed by, among others, Sinanoğlu, Paldus and Cizek in the 1960s [Citation7] and the CC method with singles, doubles and perturbative triples [CCSD(T)] today constitutes ‘the gold standard of quantum chemistry’ due to its excellent balance between computational cost and accuracy [Citation8]. In nuclear structure calculations the same method has gained traction in the last decade, providing excellent predictive power for light to medium nuclei [Citation9]. Coupled-cluster theory has also been applied to superconductivity [Citation10], lattice gauge theory [Citation11], and systems of trapped bosons such as Bose–Einstein condensates [Citation12]. These examples and the cited works are by no means exhaustive, but serve to illustrate the flexibility of the CC formalism.

In the early 1980s, Arponen introduced a novel concept into CC theory, namely the bivariational principle [Citation1,Citation13], resulting in the ECC method [Citation1,Citation2,Citation14], and an interpretation of standard CC theory and ECC theory as variational methods in a more general sense, i.e. they are bivariational. However, the ECC method has seen little use in chemistry due to its immense complexity, even for truncated versions. In physics, on the other hand, the ECC model has advantages over standard CC theory that can make it very useful. To illustrate, the ECC method correctly describes symmetry breaking in the Lipkin–Meshkov–Glick quasispin model of collective monopole vibrations in nuclei [Citation13,Citation15], in contrast to the standard CC method, which cannot. For the electronic-structure problem in quantum chemistry, the standard CC model fails dramatically to reproduce dissociation curves of even simple dimers like $N_{2}$ , while the ECC method performs quite well [Citation16,Citation17]. Thus, we are of the opinion that the ECC method is still worthwhile to study, and approximate forms may still prove to be useful in quantum chemistry.

The non-canonical and canonical hierarchies (N)C-ECC(n) introduced in this article turn out to be equivalent, and give identical predictions, when truncated with an excitation-rank complete scheme. On the other hand, the working equations are different and in fact cheaper in the canonical case, albeit marginally. An example is the NC-ECC(1)SD method, i.e. the standard CCSD approach, and the C-ECC(1)SD method, which are equivalent. We also raise the question about diagnostics for practical calculations, and show some numerical evidence that diagnostics can favourably be done using canonical coordinates, even if the computations are done in the usual manner using non-canonical variables.

Another well-known special case is NC-ECC(2), the quadratic coupled-cluster (QCC) method introduced by Van Vorhiis and Head-Gordon [Citation18,Citation19], whose canonical and non-canonical versions are equivalent in their doubles-only approximation. We note that the asymptotic cost of QCCD is the same as CCSD, even if it is a higher-order approximation to ECC. Furthermore, the perfect-pairing (PP) hierarchy [Citation20] of amplitude truncation schemes can be applied to our methods. The PP hierarchy are approximations to the complete-active space self-consistent field (CASSCF) method, including only a tiny subset of even-rank amplitudes combined with orbital optimisation, the latter which we disregard here. The corresponding canonical and non-canonical formulations (N)C-ECC(1)PPH are inequivalent. The n>1 versions could also be interesting in their own right, as investigated by Byrd and coworkers in the case of the QCC method [Citation19].

The remainder of the article is organised as follows: In Section 2, we introduce the bivariational principle and the mathematical setting of local analysis of CC methods. The key concept of our analysis is the notion of local strong monotonicity of the flipped gradient of a smooth bivariational energy functional (see Equation (Equation7(7) $F (T, Λ) = (\partial_{Λ} E (T, Λ), \partial_{T} E (T, Λ)),$ (7) )). The usefulness of this property is presented in Theorem 2.1, where local uniqueness and quadratic error estimates are established in a very general setting using Zarantonello's Theorem from nonlinear monotone operator theory [Citation21,Citation22]. Next, Theorem 2.2 summarises the main results of Ref. [Citation23], where strong monotonicity is proven for the non-canonical ECC method. For a recent review on monotonicity in CC theory we refer to [Citation24], where this property is linked to spectral gaps of the systems under study. Section 3 presents the idea of monotonicity-preserving coordinate transformations. Our main result, Theorem 3.1, is a change-of-coordinates result. When combined with Theorems 2.1 and 2.2, the analysis of (N)C-ECC(n) follows in Corollary 3.2. Our tools rely heavily on the functional analytic formulation of cluster operators and the Schrödinger equation developed by Rohwedder and Schneider [Citation25–27]. The results are formulated in a qualitative manner, in the sense that they are indeed rigorous but depend on constants whose numerical (or optimal) values are unknown. We leave further quantitative investigations for future work. In Section 4, we perform some numerical experiments to elucidate some aspects of the (N)C-ECC(n) hierarchies, before we finish with some concluding remarks in Section 5. All the proofs of our results are presented in Appendix.

2. The non-canonical extended coupled-cluster model

2.1. Bivariational principle

The starting point is a generalisation of the Rayleigh–Ritz variational principle to operators that are not necessarily self-adjoint (Hermitian in the finite-dimensional case). For simplicity, we assume a real Hilbert space $H$ . Given a system Hamiltonian $\hat{H} : D (\hat{H}) \to H$ , where $D (\hat{H}) \subset H$ is dense, we define a bivariate Rayleigh quotient, $E_{bivar} : H \oplus H \to R$ , (2) $E_{bivar} (ψ, \tilde{ψ}) = \frac{⟨ \tilde{ψ}, \hat{H} ψ ⟩}{⟨ \tilde{ψ}, ψ ⟩}, ⟨ \tilde{ψ}, ψ ⟩ \neq 0.$ (2) Requiring the functional $E_{bivar}$ to be stationary at $(ψ_{*}, {\tilde{ψ}}_{*})$ with respect to arbitrary variations in the two wavefunctions leads to the conditions $⟨ {\tilde{ψ}}_{*}, ψ_{*} ⟩ \neq 0$ , $\hat{H} ψ_{*} = E_{*} ψ_{*}$ and ${\hat{H}}^{†} {\tilde{ψ}}_{*} = E_{*} {\tilde{ψ}}_{*}$ , with $E_{*} = E_{bivar} (ψ_{*}, {\tilde{ψ}}_{*})$ , i.e. the right and left eigenvalue problem for $\hat{H}$ . If $\hat{H}$ is self-adjoint, the eigenfunctions are identical up to normalisation. The introduction of two independent wavefunctions therefore might seem to complicate matters. However, the bivariate Rayleigh quotient $E$ allows distinct approximations of ψ and $\tilde{ψ}$ , introducing more flexibility for approximate schemes. Moreover, the state defined is a (non-Hermitian) density operator, which is unique, $ρ = \frac{| ψ ⟩ ⟨ \tilde{ψ} |}{⟨ \tilde{ψ} | ψ ⟩} .$ When determined variationally, the Hellmann–Feynman theorem [Citation28] gives well-defined physical predictions in terms of ρ.

As is common in analysis of partial differential equations [Citation29,Citation30], we pass to a weak formulation, which in this case is equivalent to the strong formulation outlined above. Under the assumption that $\hat{H}$ is below bounded, we can introduce a unique extension $H : X \to X^{'}$ (dual space), where $X \subset H$ is a dense subspace, a Hilbert space with norm $‖ \cdot ‖_{X}$ , continuously embedded in $H$ . It follows that $H$ is continuously embedded in $X^{'}$ , and we have a scale of spaces with dense embeddings, $X ↪ H ↪ X^{'}$ . The operator H is bounded (i.e. continuous), and satisfies a Gårding estimate, i.e. for some $α \geq 0$ and some $μ \in R$ , $⟨ ψ, Hψ ⟩ \geq α ‖ ψ ‖_{X}^{2} + μ ‖ ψ ‖^{2}$ for all $ψ \in X$ . For the electronic-structure problem $X$ can be taken to be the space of square-integrable functions with finite kinetic energy.

If $H$ is finite-dimensional, we can set $X \equiv H$ , simplifying matters a lot, and the reader may if she or he wishes stick to this picture for simplicity, where all operators are basically matrices. In the infinite-dimensional case, however, $\hat{H}$ is typically unbounded as an operator over $H$ , and the above construction is necessary.

Under the stated conditions, $E_{bivar} : X \oplus X \to R$ is a (Fréchet) smooth map away from the singularity $⟨ \tilde{ψ}, ψ ⟩ = 0$ , and Taylor series exist and converge locally, allowing a certain degree of intuition to be borrowed from the finite-dimensional case. The right and left Schrödinger equations are then $\partial_{\tilde{ψ}} E_{bivar} (ψ_{*}, {\tilde{ψ}}_{*}) = 0$ and $\partial_{ψ} E_{bivar} (ψ_{*}, {\tilde{ψ}}_{*}) = 0$ , respectively. This is the bivariational principle.

2.2. Exponential ansatz and the ECC method

The standard CC method is formulated relative to a fixed reference $ϕ_{0} \in X$ on determinantal form. By introducing a cluster operator $T = T_{1} + T_{2} + \dots$ with $T_{k}$ containing all excitations of rank k, i.e. of k fermions relative to $ϕ_{0}$ , we have the exact parameterisation $ψ = e^{T} ϕ_{0},$ assuming intermediate normalisation, $⟨ ϕ_{0}, ψ ⟩ = 1$ .

Since all excitations commute, the cluster operators form a commutative Banach algebra under suitable conditions which we now describe [Citation25,Citation26]. We expand the cluster operators using amplitudes and basis operators, i.e. $T = \sum_{μ \in I} τ_{μ} X_{μ}$ , where $X_{μ}$ excites a number $n = n (μ)$ of fermions in the reference into the virtual space, i.e. $X_{μ} = c_{a_{1}}^{†} c_{i_{1}} \dots c_{a_{n}}^{†} c_{i_{n}},$ where the $i_{k}$ are among the occupied orbitals of $ϕ_{0}$ , and $a_{k}$ among the unoccupied orbitals. The set $I$ is the generic set of amplitude indices. We introduce a Hilbert space $V$ with norm $‖ T ‖ = ‖ T ϕ_{0} ‖_{X}$ , which becomes a useful space for formulating abstract CC theory. Fundamental results include that any $T \in V$ is a bounded operator on $X$ , such that, e.g. $\exp (T)$ also is a bounded operator. Moreover, $T^{†}$ is also a bounded operator, which means that we can make sense of, e.g. $\exp (- T) Hexp (T)$ , and that we can represent any intermediately normalised $ψ \in X$ as $ψ = e^{T} ϕ_{0}$ with $T \in V$ unique. Finally, all the elements of the algebra are nilpotent. The Banach algebra structure on $V$ allows CC theory to be rigorously formulated in the full, infinite-dimensional case. This was the approach taken in Ref. [Citation23] for a first analysis of NC-ECC theory.

Again, the finite-dimensional case may by kept in mind: In this case, cluster amplitudes are simply finite-dimensional vectors, and the existence of the exponential parameterisation is a trivial result. There is no need to introduce the norm $‖ T ‖$ , instead the Euclidean norm on the amplitudes may be used.

Any $\tilde{ψ}$ normalised according to $⟨ \tilde{ψ}, ψ ⟩ = 1$ can be represented by introducing a second cluster operator $Λ = Λ_{1} + Λ_{2} + \dots$ , viz., $\tilde{ψ} = e^{- T^{†}} e^{Λ} ϕ_{0} .$ Inserting the parametrization of ψ and $\tilde{ψ}$ into the bivariate Rayleigh quotient, we obtain the energy functional $E_{ECC} : V \oplus V \to R$ of the non-canonical ECC method, given in Equation (Equation1(1) $E_{ECC} (T, Λ) = ⟨ ϕ_{0}, e^{Λ^{†}} e^{- T} H e^{T} ϕ_{0} ⟩,$ (1) ). This map is everywhere smooth, and its critical points $(T_{*}, Λ_{*})$ are equivalent to the Schrödinger equation and its dual: Under the assumption that the eigenfunctions can be normalised according to $⟨ ϕ_{0}, ψ_{*} ⟩ = ⟨ {\tilde{ψ}}_{*}, ψ_{*} ⟩ = 1$ , $ψ_{*}$ and ${\tilde{ψ}}_{*}$ solve the Schrödinger equation and its dual if and only if (3) $\frac{\partial E_{ECC} (T_{*}, Λ_{*})}{∂Λ} = 0 and \frac{\partial E_{ECC} (T_{*}, Λ_{*})}{\partial T} = 0.$ (3) Assuming that the eigenvalue $E_{*} = E_{ECC} (T_{*}, Λ_{*})$ is non-degenerate, $(T_{*}, Λ_{*})$ is easily seen to be locally unique.

2.3. Truncations and monotonicity analysis

The non-canonical ECC energy is just one out of many possible parameterisations of the exact bivariate Rayleigh quotient $E_{bivar}$ . In this section, we take a more abstract approach and consider a general energy functional $E : V \oplus V \to R$ , obtained by some exact parameterisation of $(ψ, \tilde{ψ})$ by means of the space $V \oplus V$ , i.e. by a pair of cluster operators $(T, Λ)$ . We will discuss several such functionals in Section 3, obtained from the NC-ECC functional by coordinate transformations.

Only in rare cases can the amplitude equations (Equation3(3) $\frac{\partial E_{ECC} (T_{*}, Λ_{*})}{∂Λ} = 0 and \frac{\partial E_{ECC} (T_{*}, Λ_{*})}{\partial T} = 0.$ (3) ) be solved exactly. Introduce therefore a discretised space $V_{d} \subset V$ of finite dimension by truncating the amplitude index set $I_{d} \subset I$ , that is, $T_{d} \in V_{d}$ if and only if (4) $T_{d} = \sum_{μ \in I_{d}} τ_{d, μ} X_{μ} \in V_{d} .$ (4) The set $I_{d}$ is typically defined by the restriction of the excitations to a finite virtual space (a finite basis), and to a finite excitation rank (smaller than the number of electrons). In the chemistry literature, the excitation hierarchy for a given basis is traditionally denoted singles (S), doubles (D), and so on. In the ECC literature, one typically speaks of the SUBn approximation, with n being the maximum rank.

When the discrete space is established, we define a discrete solution by the stationary conditions of the restricted energy function $E_{d} = E ↾_{V_{d} \oplus V_{d}}$ . The stationary equations take the form (5) $\frac{\partial E (T_{d *}, Λ_{d *})}{\partial λ_{μ}} = \frac{\partial E (T_{d *}, Λ_{d *})}{\partial τ_{μ}} = 0,$ (5) for all $μ \in I_{d}$ .

It is not necessary to use the traditional truncation scheme outlined here; any increasing sequence of subspaces $V_{d} \subset V$ , with d a parameter, that can approximate elements in $V$ arbitrarily well by increasing d can be used. We let $dist (v, V_{d})$ be the distance from v to $V_{d}$ measured with respect to the norm of $V$ . Consequently, for all $v \in V$ we have $dist (v, V_{d}) \to 0$ as $d \to + \infty$ . Such a sequence of spaces is referred to as a Galerkin sequence. Other options than the traditional truncation schemes are explicitly correlated methods [Citation31] and complete-active space methods [Citation20,Citation32,Citation33] such as the PP hierarchy.

An often overlooked point in the physics literature is the fact that convergence of the equations does not in general imply convergence of their solutions. An important question is therefore whether the discrete critical points $(T_{d *}, Λ_{d *})$ converge to the exact critical points $(T_{*}, Λ_{*})$ as $d \to + \infty$ . This would imply that the energy converges too, and in a quadratic manner due to the critical point formulation.

Monotonicity is an important notion in connection with the local analysis of the CC method and its variations [Citation23–27,Citation34]. The use of monotonicity in the analysis of the standard CC method was introduced by Schneider and Rohwedder [Citation25,Citation27]. The intuition behind (strict) monotonicity is that of an everywhere (strictly) increasing (or decreasing) function. The function that one studies is typically a root problem such as the CC amplitude equations $f (T) = 0$ , where $f_{μ} (T) = ⟨ ϕ_{μ}, e^{- T} H e^{T} ϕ_{0} ⟩$ . Monotonicity allows the establishment of locally unique solutions of the Galerkin problem and is therefore important for the motivation of numerical implementations. As such it is a fundamental result of the CC method's practical usage in quantum chemistry. It also connects spectral gaps, e.g. HOMO-LUMO gap, to stability constants within the analysis [Citation24]. (See also the steerable CAS-ext gap connected to the tailored CC method [Citation35] that treats quasi-degenerate systems [Citation36].)

The particular monotonicity property that is key for this presentation is an even stronger version than that of strict monotonicity, and is called strong monotonicity. It is defined as follows: A finite-dimensional vector-valued function $F (Z)$ is locally strongly monotone near some $Z_{*}$ if for $Z_{1}, Z_{2}$ in a neighbourhood of $Z_{*}$ we have (6) $⟨ F (Z_{1}) - F (Z_{2}), Z_{1} - Z_{2} ⟩ \geq η ‖ Z_{1} - Z_{2} ‖^{2},$ (6) for some constant $η > 0$ . (In the infinite-dimensional case $⟨ \cdot, \cdot ⟩$ is the dual pairing, which then becomes an infinite sum, see Ref. [Citation23] for more details.)

Furthermore, we need the concept of Lipschitz continuity: F is locally Lipschitz continuous with constant L>0 if $‖ F (Z_{1}) - F (Z_{2}) ‖ \leq L ‖ Z_{1} - Z_{2} ‖ .$ In particular, any (Fréchet) smooth function is locally Lipschitz continuous, and so are all its derivatives.

The map F that we will study is the flipped gradient of the general energy functional $E : V \oplus V \to R$ , defined as (7) $F (T, Λ) = (\partial_{Λ} E (T, Λ), \partial_{T} E (T, Λ)),$ (7) or more compactly $F (T, Λ) = R \partial E (T, Λ)$ , with R being the map that exchanges the partial derivatives. The motivation is as follows: If we consider the bivariate Rayleigh quotient, $\partial E_{bivar}$ is not locally strongly monotone, as its critical points are saddle points. On the other hand, the flipped gradient $F_{bivar} = R \partial E_{bivar}$ can be seen to be locally strongly monotone near the ground state, given that this ground state is non-degenerate with a nonzero spectral gap to the remaining spectrum. It is natural to expect that one can find conditions such that the flipped gradient of the energy when expressed in new coordinates is locally strongly monotone.

The following is a central result, adapting a result due to Zarantonello [Citation21,Citation22] (points 1 and 2) to the present notation and setting, and applied to the flipped gradient of an energy functional (point 3).

Theorem 2.1

General convergence and error estimates

Let $F : V \oplus V \to V^{'} \oplus V^{'}$ be a map, and let $U \subset V \oplus V$ be an open ball containing a $Z_{*}$ such that $F (Z_{*}) = 0$ .

Let $V_{d} \subset V$ be a Galerkin sequence of subspaces with $P_{d}$ being the orthogonal projector onto $V_{d} \oplus V_{d}$ . Furthermore, let $F_{d} : V_{d} \oplus V_{d} \to V_{d}^{'} \oplus V_{d}^{'}$ be the Galerkin discretisation of F, i.e. $F_{d} (Z_{d}) = P_{d} F (Z_{d})$ .

Assume that F is locally strongly monotone with constant $η > 0$ and Lipschitz continuous with constant L>0 on U. Then, the following holds:

(1)	$Z_{*}$ is the only root in U.
(2)	There is a sufficiently large $d_{0}$ , such that for any $d > d_{0}$ , there exists $Z_{d } \in V_{d} \oplus V_{d}$ such that $F_{d} (Z_{d }) = 0$ . This root is unique in U and we have the following error estimate (quasi-optimality of the discrete solution): (8) $‖ Z_{* d} - Z_{} ‖ \leq \frac{L}{η} dist (Z_{}, V_{d} \oplus V_{d}) .$ (8)

Let $E : V \oplus V \to R$ , $Z \mapsto E (Z)$ be a (Fréchet) smooth energy functional. Let R be the flipping map as introduced after Equation (Equation7(7) $F (T, Λ) = (\partial_{Λ} E (T, Λ), \partial_{T} E (T, Λ)),$ (7) ) and set $F = R \partial E$ , and $E_{*} = E (Z_{*})$ .

(3)

For $d > d_{0}$ , the discrete Galerkin equations $\partial E_{d} (Z_{* d}) = 0$ have locally unique solutions, and in addition to the error estimate (Equation8(8) $‖ Z_{* d} - Z_{*} ‖ \leq \frac{L}{η} dist (Z_{*}, V_{d} \oplus V_{d}) .$ (8) ), we have the energy error (9) $\begin{aligned} | E (Z_{* d}) - E_{*} | & \leq C ‖ Z_{* d} - Z_{*} ‖^{2} \\ \leq C {(\frac{L}{η})}^{2} dist (Z_{*}, V_{d} \oplus V_{d})^{2} . \end{aligned}$ (9)

The proof is presented in Appendix. The error estimate (Equation9(9) $\begin{aligned} | E (Z_{* d}) - E_{*} | & \leq C ‖ Z_{* d} - Z_{*} ‖^{2} \\ \leq C {(\frac{L}{η})}^{2} dist (Z_{*}, V_{d} \oplus V_{d})^{2} . \end{aligned}$ (9) ) shows that for (smooth) energy functionals with a locally strongly monotone flipped gradient, the bivariational method of discretisation behaves very similar to the usual Rayleigh–Ritz variational method of discretisation. As we enlarge the Galerkin space, the discrete ground state converges, and the energy error is quadratic in the error of the state. However, we cannot guarantee convergence from above, but this is much less important than actually having a quadratic error.

An interesting fact is that Brouwer's fixed point theorem [Citation37] can be used to obtain a sufficient condition for the constant $d_{0}$ , where quadratic convergence sets in, namely (10) $dist (Z_{*}, V_{d_{0}} \oplus V_{d_{0}}) < \frac{δ}{1 + L / η},$ (10) see Refs. [Citation23,Citation26]. The radius δ of the domain U in Theorem 2.1 is unknown in general, and we see that a small monotonicity constant η relative to L will be severely detrimental for the convergence, as it forces $d_{0}$ to be very large. On the other hand, in general one has that η and the Lipschitz constant L are related by $η \leq L$ . The optimal value of the right-hand side is therefore $δ / 2$ .

The following theorem summarises the main results of Ref. [Citation23], where the proof and more details can be found:

Theorem 2.2

NC-ECC monotonicity

Assume that the system Hamiltonian $\hat{H}$ is self-adjoint, and that the ground state of $\hat{H}$ exists, is non-degenerate, and that there is a spectral gap $γ > 0$ between the ground-state energy $E_{*}$ and the rest of the spectrum. Assume that the reference $ϕ_{0}$ is such that it is not orthogonal to the ground-state wavefunction. Let $Z_{*} = (T_{*}, Λ_{*}) \in V \oplus V$ be the corresponding critical point of $E_{ECC}$ , and assume that $T_{*}$ and $Λ_{*}$ are not too large, i.e. that $ϕ_{0}$ is a sufficiently good approximation to $ψ_{*}$ . Then, $F = R \partial E_{ECC}$ is locally strongly monotone near $Z_{*}$ with a constant $η = Cγ$ , for some C<1.

The consequence of Theorem 2.2, when combined with Theorem 2.1, is that the non-canonical ECC method is convergent as the discrete cluster amplitude space $V_{d}$ approaches the untruncated limit. As already remarked, we do not explicitly know the onset $d_{0}$ of quadratic convergence.

3. Monotonicity-preserving coordinate transformations

3.1. A class of exact coupled-cluster models

In addition to the non-canonical ECC parameterisation, Arponen also considered a second parameterisation of the bra and ket wavefunctions, which gives equations of motion for the time-dependent Schrödinger equation that are canonical in the sense of Hamiltonian mechanics [Citation1,Citation2]. (This must not be confused with the use of canonical Hartree–Fock orbitals, which is unrelated.) This parameterisation is given in terms of a coordinate transformation $θ_{C - ECC} : V \oplus V \to V \oplus V$ as (11) $(T, Λ) = θ_{C - ECC} (T^{'}, Λ^{'}),$ (11) where $Λ^{'} = Λ$ , and $T = S (T^{'}; Λ^{'})$ is defined by (12) $QT ϕ_{0} = Q e^{- {Λ^{'}}^{†}} T^{'} ϕ_{0}, Q = I - | ϕ_{0} ⟩ ⟨ ϕ_{0} | .$ (12) This function has inverse $Q T^{'} ϕ_{0} = Q e^{{\tilde{Λ}}^{†}} T ϕ_{0}$ . (In Arponen's work [Citation2], the notation $(T^{'}, Λ^{'}) = (Σ, {\tilde{Σ}}^{†})$ is used.) The map $θ_{C - ECC}$ is smooth and invertible with a smooth inverse, and we therefore obtain a new exact energy functional $E_{C - ECC} = E_{ECC} \circ θ_{C - ECC},$ with values (13) $E_{C - ECC} (T^{'}, Λ^{'}) = ⟨ ϕ_{0}, e^{(Λ^{'})^{†}} e^{- S (T^{'}; Λ^{'})} H e^{S (T^{'}; Λ^{'})} ϕ_{0} ⟩ .$ (13) A remarkable consequence of this second parameterisation is that it corresponds to retaining only those terms in Equation (Equation1(1) $E_{ECC} (T, Λ) = ⟨ ϕ_{0}, e^{Λ^{†}} e^{- T} H e^{T} ϕ_{0} ⟩,$ (1) ) that can be represented by ‘doubly linked’ diagrams [Citation1,Citation2], (14) $E_{C - ECC} (T^{'}, Λ^{'}) = {⟨ ϕ_{0}, e^{(Λ^{'})^{†}} (H e^{T^{'}})_{C} ϕ_{0} ⟩}_{DL} .$ (14) The phrase ‘doubly linked’ means that every power of $(Λ^{'})^{†}$ is connected to two $T^{'}$ operators on its right, unless it is connected directly to H. The subscript ‘C’ for ‘connected’ is the usual connectedness criterion on contractions between H and powers of $T^{'}$ [Citation8]. Thus, the canonical coordinates represent a more compact representation in that the resulting tensor contractions or diagrams in the energy are identical to those obtained in the NC-ECC energy (Equation1(1) $E_{ECC} (T, Λ) = ⟨ ϕ_{0}, e^{Λ^{†}} e^{- T} H e^{T} ϕ_{0} ⟩,$ (1) ), except for some diagrams that are explicitly eliminated.

Similarly, for the standard CC method, Arponen introduced the coordinate transformation $θ_{CC}$ given by (15) $(T, Λ) = θ_{CC} (T^{'}, Λ^{'}) = (T^{'}, e^{Λ^{'}} - 1) .$ (15) We obtain the energy functional $E_{CC} = E_{ECC} \circ θ_{CC}$ , where (16) $E_{CC} (T^{'}, Λ^{'}) = ⟨ ϕ_{0}, (1 + (Λ^{'})^{†}) e^{- T^{'}} H e^{T^{'}} ϕ_{0} ⟩,$ (16) that is, the standard CC Lagrangian [Citation3]. Incidentally, the standard CC coordinates are also canonical.

The map $θ_{CC}$ can be generalised to Taylor polynomials. By setting $e^{Λ} = (e^{Λ^{'}})_{n} \equiv 1 + Λ^{'} + \frac{1}{2} (Λ^{'})^{2} + \dots + \frac{1}{n!} (Λ^{'})^{n},$ we can solve for Λ in terms of $Λ^{'}$ by, e.g. considering first the singles, then doubles, etc., giving a smooth map $G_{n} : V \to V$ such that $e^{G_{n} (Λ^{'})} = (e^{Λ^{'}})_{n}$ . In fact, since the cluster operators are nilpotent, $G_{n} (Λ^{'}) = \ln [(e^{Λ^{'}})_{n}]$ , where the logarithm is expanded in a (finite) Taylor series around the identity. Similarly, we can solve for $Λ^{'}$ in terms of Λ, demonstrating that this map has an inverse, and in fact that this inverse is smooth. We obtain a coordinate transformation $θ_{n}$ given by (17) $(T, Λ) = θ_{n} (T^{'}, Λ^{'}) = (T^{'}, G_{n} (Λ^{'})),$ (17) and the corresponding energy functional (18a) $E_{NC - ECC (n)} (T^{'}, Λ^{'}) = ⟨ ϕ_{0}, (e^{(Λ^{'})^{†}})_{n} e^{- T^{'}} H e^{T^{'}} ϕ_{0} ⟩ .$ (18a) Coordinate transformations form a group, and may thus be composed. By combining $θ_{C - ECC (n)} = θ_{n} \circ θ_{C - ECC}$ , we obtain an energy functional (18b) $\begin{aligned} E_{C - ECC (n)} (T^{'}, Λ^{'}) \\ = ⟨ ϕ_{0}, (e^{(Λ^{'})^{†}})_{n} e^{- S (T^{'}; Λ^{'})} H e^{S (T^{'}; Λ^{'})} ϕ_{0} ⟩ . \end{aligned}$ (18b) Since, in the NC-ECC energy functional (Equation1(1) $E_{ECC} (T, Λ) = ⟨ ϕ_{0}, e^{Λ^{†}} e^{- T} H e^{T} ϕ_{0} ⟩,$ (1) ), an exponential $e^{- Λ^{†}}$ can be inserted after $e^{T}$ without changing the result, both of these hierarchies correspond to truncations of a Baker–Campbell–Hausdorff expansion at order n, and are thus manifestly extensive.

3.2. Coordinate transformation theorem

Equations (Equation18a(18a) $E_{NC - ECC (n)} (T^{'}, Λ^{'}) = ⟨ ϕ_{0}, (e^{(Λ^{'})^{†}})_{n} e^{- T^{'}} H e^{T^{'}} ϕ_{0} ⟩ .$ (18a) ) and (Equation18b(18b) $\begin{aligned} E_{C - ECC (n)} (T^{'}, Λ^{'}) \\ = ⟨ ϕ_{0}, (e^{(Λ^{'})^{†}})_{n} e^{- S (T^{'}; Λ^{'})} H e^{S (T^{'}; Λ^{'})} ϕ_{0} ⟩ . \end{aligned}$ (18b) ) represent two hierarchies of exact parameterisations of the bivariate Rayleigh quotient. It is therefore of interest to determine whether they have locally strongly monotone flipped gradients. To establish this, we study the effect on local strong monotonicty of a coordinate transformation.

Theorem 3.1

Coordinate transformations

Let $E : V \oplus V \to R$ be a smooth energy functional, let $Z_{*}$ be a critical point, and assume that $F = R \partial E$ is locally strongly monotone near $Z_{*}$ with constant $η > 0$ . Let a smooth $θ : V \oplus V \to V \oplus V$ with a smooth inverse be a given coordinate transformation, and let $E_{θ} = E \circ θ$ be the energy functional expressed in the new coordinates. Let $W_{*} = θ^{- 1} (Z_{*})$ be the corresponding critical point for $E_{θ}$ , and let $F_{θ} = R \partial E_{θ}$ be its flipped gradient. Let $M_{*} = \partial θ (W_{*})$ be the Jacobian at $W_{*}$ . Then we have the following conclusions:

(1)	If $M_{} R = R M_{}$ , then $F_{θ}$ is locally strongly monotone near $W_{}$ with constant $‖ M_{}^{- 1} ‖^{- 2} η$ .
(2)	In the noncommuting case, if $m_{} = M_{} - I$ is sufficiently small, $F_{θ}$ is locally strongly monotone near $W_{}$ with constant $η^{'} = η ‖ M_{}^{- 1} ‖^{- 2} - C (I + ‖ m_{} ‖) ‖ m_{} ‖,$ where C is the constant from Theorem 2.1(3).

Theorem 3.1 tells us that if we create a new method by changing coordinates in a sufficiently well-behaved manner, the monotonicity of the flipped gradient of the energy will be preserved. Hence, the new method will be convergent in the sense of Theorem 2.1, if the original method is. This gives us great flexibility in choosing new cluster-operator parameterisations, since we already know that non-canonical ECC is convergent. The interested reader can find the proof in the Appendix.

3.3. Monotonicity of (N)C-ECC $(n)$ models

We apply Theorem 3.1 to the maps $θ_{n}$ and $θ_{n} \circ θ_{C - ECC}$ that define the NC-ECC $(n)$ and C-ECC $(n)$ models, respectively. The conclusion is as follows (with proof given in the Appendix):

Corollary 3.2

For any of the NC-ECC $(n)$ or C-ECC $(n)$ models, the assumption that the ground-state critical point $W_{*} = (T_{*}^{'}, Λ_{*}^{'})$ is not too large together with a spectral gap $γ > 0$ is sufficient to guarantee local strong monotonicity of the flipped gradient of the energy, and hence a quasi-optimal solution to the Galerkin problem and a quadratic error estimate for the energy.

We note that our estimates are probably pessimistic for some of the models covered here. The analysis starts with a given monotonicity constant η for the NC-ECC scheme, depending on the spectral gap, and consistently produces an $η^{'} < η$ for the method obtained using the coordinate change, worsening the error estimates. It may well be that a direct analysis of the secondary method yields a better $η^{'}$ . The assumptions on the Hamiltonian and reference may also become milder. However, the important point here is that Theorem 3.1 does guarantee that the new method is convergent under some reasonable conditions. For example, we have now proven that QCC theory [Citation18] is convergent if the reference is a sufficiently good approximation to the ground state, and using Equation (Equation10(10) $dist (Z_{*}, V_{d_{0}} \oplus V_{d_{0}}) < \frac{δ}{1 + L / η},$ (10) ) also a basic means to study which truncations or Galerkin schemes can be reasonable, at least in principle. It may be interesting to see whether truncation schemes like the PP hierarchy with orbital optimisation, also in a quadratic n = 2 or higher formulation [Citation19], can be further analysed based on our results.

In the proof of Theorem 3.1 (see Appendix), it arises naturally that the most favourable coordinate transformations are those that commute with the flipping map R, since local strong monotonicity then follows with no assumptions on the Jacobian of the map. The Jacobian commutes with R if and only if (19) $\frac{\partial T}{\partial T^{'}} = \frac{∂Λ}{\partial Λ^{'}}, and \frac{\partial T}{\partial Λ^{'}} = \frac{∂Λ}{\partial T^{'}},$ (19) and one must assume that this holds at every point in $V \oplus V$ , as one does not know a priori where the critical point is. It is not clear what such transformations in general look like, and whether such transformations are useful reparameterisations of the energy.

3.4. Properties of the canonical and non-canonical schemes

The coordinate transformation $θ_{C - ECC}$ as represented by Equation (EquationA7(A7) $A (Λ^{'}) T^{'} = \sum_{μ, ν \in I} X_{μ} ⟨ ϕ_{μ}, e^{- (Λ^{'})^{†}} ϕ_{ν} ⟩ ⟨ ϕ_{ν}, T^{'} ϕ_{0} ⟩ .$ (A7) ) is such that when applied to a cluster operator $T_{k}^{'}$ of rank k, it generates terms $T_{k^{'}}$ with $k^{'} \leq k$ . The same is true for the inverse map. Thus, if $V_{d}$ is excitation-rank complete, i.e. it contains all excitations of rank up to and including some k>0, then the Galerkin discretisation of the NC-ECC(n) method commutes with changing coordinates via the coordinate map $θ_{C - ECC}$ , $E_{C - ECC (n), d} = E_{NC - ECC (n), d} \circ θ_{C - ECC} .$ By inspection, one can also see that this holds for a doubles-only truncation, since $A (Λ^{'}) = I$ in this case. We obtain the following result:

Theorem 3.3

Equivalence of canonical and non-canonical coordinates

Let $V_{d}$ be excitation-rank complete or consist of doubles excitations only, and let n be given. Then, the discrete solutions $(T_{* d}, Λ_{* d})$ and $(T_{* d}^{'}, Λ_{* d}^{'})$ of the NC-ECC(n) and C-ECC(n) methods, respectively, are equivalent and related via $θ_{C - ECC}$ , i.e. $T_{* d} = A (Λ_{* d}^{'}) T_{* d}^{'}$ and $Λ_{* d} = Λ_{* d}^{'}$ .

We stress that, if $V_{d}$ is not excitation-rank complete, the canonical and non-canonical parameterisations are not equivalent. This would be the case for the PP hierarchy of truncations [Citation19,Citation20].

According to the doubly linked structure of the energy functional $E_{C - ECC (n)}$ , see the discussion after Equation (Equation14(14) $E_{C - ECC} (T^{'}, Λ^{'}) = {⟨ ϕ_{0}, e^{(Λ^{'})^{†}} (H e^{T^{'}})_{C} ϕ_{0} ⟩}_{DL} .$ (14) ), the amplitude equations for the canonical case are cheaper, albeit by a small amount. Moreover, it is reasonable to expect that the canonical solution $(T_{* d}^{'}, Λ_{* d}^{'})$ is more compact compared to $(T_{* d}, Λ_{* d})$ . We investigate this claim numerically in Section 4.

4. Numerical results

4.1. Implementation

The (un-)truncated (N)C-ECC(n)SD Equations (Equation18(18a) $E_{NC - ECC (n)} (T^{'}, Λ^{'}) = ⟨ ϕ_{0}, (e^{(Λ^{'})^{†}})_{n} e^{- T^{'}} H e^{T^{'}} ϕ_{0} ⟩ .$ (18a) ) and (Equation5(5) $\frac{\partial E (T_{d *}, Λ_{d *})}{\partial λ_{μ}} = \frac{\partial E (T_{d *}, Λ_{d *})}{\partial τ_{μ}} = 0,$ (5) ), together with the untruncated coordinate transformation Equation (EquationA7(A7) $A (Λ^{'}) T^{'} = \sum_{μ, ν \in I} X_{μ} ⟨ ϕ_{μ}, e^{- (Λ^{'})^{†}} ϕ_{ν} ⟩ ⟨ ϕ_{ν}, T^{'} ϕ_{0} ⟩ .$ (A7) ) have been implemented in a local full CI-based program, i.e. all intermediates are expressed as vectors in the full CI basis. To this end, the C-ECC(n) amplitudes are computed using transformed NC-ECC(n) residual expressions [Citation17]. The C-ECC(n) amplitude equations are thus: (20a) $\begin{aligned} 0 & = \sum_{ν \in I} ⟨ ϕ_{ν}, e^{- (Λ^{'})^{†}} ϕ_{μ} ⟩ ⟨ ϕ_{0}, (e^{(Λ^{'})^{†}})_{n} [H^{S}, X_{ν}] ϕ_{0} ⟩, \end{aligned}$ (20a) (20b) $\begin{aligned} 0 & = ⟨ ϕ_{μ}, (e^{(Λ^{'})^{†}})_{n - 1} H^{S} ϕ_{0} ⟩ \\ - \sum_{ν \in I} ⟨ ϕ_{μ}, X_{ν}^{†} e^{- (Λ^{'})^{†}} T^{'} ϕ_{0} ⟩ ⟨ ϕ_{0}, (e^{(Λ^{'})^{†}})_{n} [H^{S}, X_{ν}] ϕ_{0} ⟩, \end{aligned}$ (20b) where $H^{S} = e^{- S (T^{'}; Λ^{'})} H e^{S (T^{'}; Λ^{'})}$ . The sparsity of the coordinate transformation Equation (EquationA7(A7) $A (Λ^{'}) T^{'} = \sum_{μ, ν \in I} X_{μ} ⟨ ϕ_{μ}, e^{- (Λ^{'})^{†}} ϕ_{ν} ⟩ ⟨ ϕ_{ν}, T^{'} ϕ_{0} ⟩ .$ (A7) ) has been exploited throughout, e.g. only singles amplitudes are transformed in the case where $V_{d}$ contains only singles and doubles. The coupled amplitude equations (Equation20(20a) $\begin{aligned} 0 & = \sum_{ν \in I} ⟨ ϕ_{ν}, e^{- (Λ^{'})^{†}} ϕ_{μ} ⟩ ⟨ ϕ_{0}, (e^{(Λ^{'})^{†}})_{n} [H^{S}, X_{ν}] ϕ_{0} ⟩, \end{aligned}$ (20a) ) are solved iteratively starting from an MP2-guess, using an alternating scheme and applying DIIS convergence acceleration. In all computations, residuals and energies were converged to a threshold of $10^{- 4}$ and $10^{- 6}$ a.u., respectively. The (N)C-ECC(1)SD and (N)C-ECC(∞)SD implementations are verified by reproducing the ‘CCSD’ and ‘ECCSD’ energies presented in [Citation17].

4.2. Numerical experiments

The (N)C-ECC(n)SD and (N)C-ECC(n)DT models have been studied numerically by investigating the potential energy curves of the hydrogen fluoride molecule with interatomic distances $1.0 \leq R_{H - F} \leq 3.5 (a_{0})$ in a DZV basis set [Citation17] and the nitrogen molecule with $1.5 \leq R_{N - N} \leq 4.0 (a_{0})$ with a frozen core in a 6-31G basis [Citation38]. Additionally, the H $_{8}$ model system with structural parameters $0.0001 \leq α \leq 1.0$ in a minimal basis set [Citation32,Citation39] is considered. For large distances R and small α, respectively, these systems comprise significant multireference character, i.e. the weight of the Hartree–Fock configuration in the full CI wave function, $| ⟨ ϕ_{0}, ψ^{FCI} ⟩ |^{2}$ , is fairly small. Thus, these species are good candidates to study novel quantum chemical methods.

The energy curves of the canonical models C-ECC(n)SD are identical to the non-canonical NC-ECC(n)SD ones and are thus not presented here. However, the results differ if excitation-rank incomplete truncation schemes are employed. For instance, in canonical ECC(n)DT, singles amplitudes are effectively generated from doubles and triples amplitudes, while these are absent in the non-canonical model. This effect has been studied on the potential curve of the HF molecule and is depicted in Figure : The generation of singles amplitudes entails that the canonical computation is lower in energy, in particular towards the multireference region where these contribute significantly to the wave function expansion. This depends, however, on the role of the singles amplitudes in the wave function: In test computations on the H $_{8}$ model system a different trend was observed, consistent with the diminished importance of singles in the wave function (vide infra). Therefore, we cannot conclude that the canonical coordinates are consistently better when unconventional truncations are used.

Figure 1. Difference between a canonical and a non-canonical ECC(n)DT computation for the potential curve of HF.

In order to investigate the effect of using different coordinates in ECC(n)SD computations, we calculated a set of CC diagnostics which are often used to assess the quality of CC computations [Citation40,Citation41]. These are based on the largest singular value ( $D_{1}$ ) and Frobenius norm ( $T_{1}$ ) of the matrix representation of the singles amplitudes. (Equivalently, $T_{1}^{2} = ‖ τ_{* 1} ‖_{2}^{2} / N$ , the sum of the squares of the singles amplitudes, with N the number of correlated electrons.) Although diagnostics based on doubles amplitudes are preferred, they are not as available in implementations as are the singles-based variants [Citation42]. Additionally, we computed the diagnostic $| | τ_{*} | |_{2}^{2} / (| | τ_{*} | |_{2}^{2} + 1)$ which involves all the amplitudes. This choice can be motivated from monotonicity arguments and will be discussed in a forthcoming paper.

The values have been computed for truncation schemes $n = 1, 2, \infty$ . Since the trends are very similar, only the data for n = 1 is presented for HF and H $_{8}$ . For N $_{2}$ , we focus on the results for n = 2, because the quadratic model is well-known to be a better approximation to full CI compared to standard CCSD in this case [Citation16] (cf. line starting upper left with triangle markers in Figure ). We stress, that even if the NC-ECC(1)SD and C-ECC(1)SD (which is standard CCSD) are equivalent, the singles amplitudes in their respective parameterisations differ, producing different values for the diagnostics. Figure shows the diagnostics correlated with the multireference character for the HF potential curve, in Figures and values for the H $_{8}$ model and the N $_{2}$ molecule are shown, respectively. In both H $_{8}$ and N $_{2}$ , electron correlation is dominated by doubles amplitudes, as can be seen from the small values of the singles-based diagnostics. Since the reparameterisation of the amplitudes in the canonical model does not affect the amplitudes of highest excitation rank, the difference between the NC-ECC(n)SD and C-ECC(n)SD amplitude vectors is negligible. This is different for the HF case. Here, the amplitude norms of the canonical models are consistently smaller than the non-canonical variants, indicating that the wave function parameterisation is more compact.

Figure 2. Comparison of CC diagnostics of the C-ECC(1)SD and NC-ECC(1)SD model for the HF potential curve correlated with the multireference character.

Figure 3. Comparison of CC diagnostics of the C-ECC(1)SD and NC-ECC(1)SD model for the H $_{8}$ potential curve correlated with the multireference character.

Figure 4. Comparison of CC diagnostics of the C-ECC(2)SD and NC-ECC(2)SD model (lower set of five curves, almost on top of each other) for the N $_{2}$ potential curve correlated with the multireference character. The red curve shows diagnostics for NC-ECC(1)SD, indicating the inferiority of this model in the multireference region.

Our numerical experiments suggest, that for excitation-rank incomplete models, the canonical map generates effectively an excitation-rank complete parameterisation, but does not necessarily yield significantly better results. Concerning the a priori excitation-rank complete models, it has been found that the canonical parameterisation can be more compact compared to the non-canonical one, a desired property for post-Hartree–Fock methods. In particular, it may be useful to consider using canonical coordinates for diagnostics, even for well-established numerical codes using standard CC formalism.

5. Concluding remarks

In this article, we formulated basic error estimates for a class of exact models, defined in terms of replacing, in Arponen's ECC method, the exact exponential $e^{Λ^{†}}$ of the dual cluster operator with an n-th Taylor polynomial, the canonical C-NCC(n) models and the non-canonical NC-ECC(n) models. The central result was a coordinate-transformation theorem, Theorem 3.1, that gives error estimates for any method that can be described as a coordinate transformation of ECC theory. Notably, these results guarantee asymptotically quadratic error estimates for the ground-state energy of all models, under certain mild conditions.

Apart from Theorem 3.1, a basically self-contained mathematical framework for local error analysis of coupled-cluster methods was presented. This was based on Arponen's bivariational principle and basic results from nonlinear monotone operator theory, i.e. Zarantonello's theorem. Also central was our prior analysis of Arponen's extended coupled-cluster method in its non-canonical formulation.

The methods covered by our analysis include standard CC theory, quadratic CC theory [Citation18,Citation19], the perfect-pairing hierarchy [Citation20] for approximating CASSCF, also in its quadratic version [Citation19], and, Arponen's canonical ECC method. The computational cost of the (N)C-ECC $(n)$ methods truncated with the standard singles, doubles, etc., scheme, are expensive: already the nonlinear terms in $(Λ^{†})^{2}$ pushes the cost beyond standard CC theory when higher than doubles are considered. However, having obtained exact mathematical characterisations of the hierarchies of methods is an important first step in producing cheaper and more reliable methods compared to standard CC. One such approach could be similar to the CC2 method of Christiansen and coworkers [Citation43], where the cost of CCSD is reduced from $O (N^{6})$ to $O (N^{5})$ , $N$ being the system size.

The error estimates are not optimal for many methods. A direct analysis of canonical ECC would probably provide the most optimistic analysis for all the C-ECC(n) methods, due to the doubly linked structure and the equivalence of excitation-rank complete Galerkin discretisations.

Finally, we performed some simple numerical experiments, focusing on the possibility of using canonical coordinates in place of the usual CC amplitudes when doing diagnostic estimates on CC calculations on systems with multireference character. Our preliminary findings support the hypothesis that the canonical coordinates are more compact compared to the usual coordinates, providing more accurate diagnostics.

An interesting extension of the present work would be to study truncations where singles-amplitudes are replaced by orbital rotations, either unitary or biorthogonal, as in the QCC and PP approaches, or the non-orthogonal orbital optimised coupled-cluster method of Pedersen and coworkers. [Citation44]. Moreover, the complete-active space coupled-cluster method by Adamowicz and coworkers [Citation32] fits the present scheme. It is also known that quadratic CC and ECC in general are quite good at reproducing multireference character, while standard single-reference CC is quite poor at this. Thus, a modified analysis of the ECC method that includes multireference assumptions, such as the steerable CAS-ext gap of Ref. [Citation34], could potentially lead to a deeper understanding of how CC methods generally behave in the presence of static correlation.

Acknowledgments

The authors are thankful to M. A. Csirik for useful comments.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work has received funding from the Research Council of Norway (RCN) [CoE grant numbers 287906 and 262695] (Hylleraas Centre for Quantum Molecular Sciences), and from ERC-STG-2014 [grant number 639508].

References

J.S. Arponen, Ann. Phys. 151 (2), 311–382 (1983). doi:10.1016/0003-4916(83)90284-1.
Web of Science ®Google Scholar
J.S. Arponen, R.F. Bishop and E. Pajanne, Phys. Rev. A 36, 2519–2538 (1987). doi:10.1103/PhysRevA.36.2519.
PubMed Web of Science ®Google Scholar
T. Helgaker and P. Jørgensen, Adv. Quant. Chem. 19, 183–245 (1988). doi:10.1016/S0065-3276(08)60616-4.
Google Scholar
J. Hubbard, Proc. Roy. Soc. A: Math. Phys. Eng. Sci. 240 (1223), 539–560 (1957). doi:10.1098/rspa.1957.0106.
Google Scholar
F. Coester, Nucl. Phys. 7, 421–424 (1958). doi:10.1016/0029-5582(58)90280-3.
Web of Science ®Google Scholar
F. Coester and H. Kümmel, Nucl. Phys. 17, 477–485 (1960). doi:10.1016/0029-5582(60)90140-1.
Web of Science ®Google Scholar
J. Paldus, in Theory and Applications of Computational Chemistry: The First Forty Years, edited by C.E. Dykstra, G. Frenking, K.S. Kim, and G.E. Scuseria (Elsevier, Edinburgh, London, Amsterdam, 2005), Chap. 7, p. 115.
Google Scholar
R.J. Bartlett and M. Musiał, Rev. Mod. Phys. 79 (1), 291–352 (2007). doi:10.1103/RevModPhys.79.291.
Web of Science ®Google Scholar
G. Hagen, T. Papenbrock, M. Hjorth-Jensen and D.J. Dean, Rep. Prog. Phys. 77 (9), 096302 (2014). doi:10.1088/0034-4885/77/9/096302.
PubMedGoogle Scholar
K. Emrich and J.G. Zabolitzky, in Recent Progress in Many-Body Theories, edited by H. Kümmel and M.L. Ristig (Springer, Berlin, Heidelberg, 1984), pp. 271–278.
Google Scholar
B.H.J. McKellar, C.R. Leonard and L.C.L. Hollenberg, Int. J. Mod. Phys. B 14, 2023–2037 (2000). doi:10.1142/S0217979200001151.
Web of Science ®Google Scholar
L.S. Cederbaum, O.E. Alon and A.I. Streltsov, Phys. Rev. A 73 (4) (2006). doi:10.1103/PhysRevA.73.043609.
Google Scholar
J.S. Arponen, J. Phys. G: Nucl. Phys. 8, L129 (1982). doi:10.1088/0305-4616/8/8/004.
Web of Science ®Google Scholar
J.S. Arponen, R.F. Bishop and E. Pajanne, Phys. Rev. A 36, 2539–2549 (1987). doi:10.1103/PhysRevA.36.2539.
PubMed Web of Science ®Google Scholar
R.F. Bishop, Theor. Chim. Acta 80, 95–148 (1991). doi:10.1007/BF01119617.
Google Scholar
B. Cooper and P.J. Knowles, J. Chem. Phys. 133 (234102), 20120 (2010). doi:10.1063/1.3520564.
Google Scholar
F.A. Evangelista, J. Chem. Phys. 134 (22), 224102 (2011). doi:10.1063/1.3598471.
PubMed Web of Science ®Google Scholar
T. Van Voorhis and M. Head-Gordon, Chem. Phys. Lett. 330, 585–594 (2000). doi:10.1016/S0009-2614(00)01137-4.
Web of Science ®Google Scholar
E.F.C. Byrd, T. Van Voorhis and M. Head-Gordon, J. Phys. Chem. B 106 (33), 8070–8077 (2002). doi:10.1021/jp020255u.
Web of Science ®Google Scholar
S. Lehtola, J. Parkhill and M. Head-Gordon, J. Chem. Phys. 145, 134110 (2016). doi:10.1063/1.4964317.
PubMed Web of Science ®Google Scholar
E. Zarantonello, Technical Report 160 (U.S. Army Math. Res. Centre, Madison, WI, 1960).
Google Scholar
E. Zeidler, Nonlinear Functional Analysis and Its Application II/B (Springer, New York, Heidelberg, Berlin, 1990).
Google Scholar
A. Laestadius and S. Kvaal, SIAM J. Numer. Anal. 56 (2), 660–683 (2018). doi:10.1137/17M1116611.
Web of Science ®Google Scholar
A. Laestadius and F.M. Faulstich, Mol. Phys. 117 (17), 2362–2373 (2019). doi:10.1080/00268976.2018.1564848.
Web of Science ®Google Scholar
R. Schneider, Numer. Math. 113, 433–471 (2009). doi:10.1007/s00211-009-0237-3.
Web of Science ®Google Scholar
T. Rohwedder, ESAIM: Math. Mod. Num. Anal. 47, 421–447 (2013). doi:10.1051/m2an/2012035.
Google Scholar
T. Rohwedder and R. Schneider, ESAIM: Math. Mod. Num. Anal. 47, 1553–1582 (2013). doi:10.1051/m2an/2013075.
Web of Science ®Google Scholar
R.P. Feynman, Phys. Rev. 56 (4), 340–343 (1939). doi:10.1103/PhysRev.56.340.
Google Scholar
M. Reed and B. Simon, Methods of Modern Mathematical Physics I: Functional Analysis (Academic Press, San Diego, New York, 1980).
Google Scholar
K. Schmüdgen, Unbounded Self-adjoint Operators on Hilbert Space. Graduate Texts in Mathematics (Springer, Dordrecht, Heidelberg, 2012).
Google Scholar
C. Hättig, W. Klopper, A. Köhn and D.P. Tew, Chem. Rev. 112, 4–74 (2012). doi:10.1021/cr200168z.
PubMed Web of Science ®Google Scholar
L. Adamowicz, J.-P. Malrieu and V.V. Ivanov, J. Chem. Phys. 112 (23), 10075–10084 (2000). doi:10.1063/1.481649.
Web of Science ®Google Scholar
K. Kowalski, J. Chem. Phys. 148, 094104 (2018). doi:10.1063/1.5010693.
Google Scholar
F. Faulstich, A. Laestadius, S. Kvaal, Ö. Legeza and R. Schneider, SIAM J. Num. Anal. 57, 2579 (2019). doi:10.1137/18M1171436.
Web of Science ®Google Scholar
T. Kinoshita, O. Hino and R.J. Bartlett, J. Chem. Phys. 123 (7), 074106 (2005). doi:10.1063/1.2000251.
PubMed Web of Science ®Google Scholar
F.M. Faulstich, M. Máté, A. Laestadius, M.A. Csirik, L. Veis, A. Antalik, J. Brabec, R. Schneider, J. Pittner, S. Kvaal and Ö. Legeza, J. Chem. Theor. Comp. 15 (4), 2206–2220 (2019). doi:10.1021/acs.jctc.8b00960.
PubMed Web of Science ®Google Scholar
E. Zeidler, Nonlinear Functional Analysis and Its Application I (Springer, New York Heidelberg Berlin, 1986).
Google Scholar
A. Engels-Putzka and M. Hanrath, Mol. Phys. 107 (2), 143–155 (2009). doi:10.1080/00268970902724922.
Web of Science ®Google Scholar
K. Jankowski, L. Meissner and J. Wasilewski, Int. J. Quant. Chem. 28 (6), 931–942 (1985). doi:10.1002/qua.v28:6. doi: 10.1002/qua.560280622
Web of Science ®Google Scholar
T.J. Lee and P.R. Taylor, Int. J. Quant. Chem. 36 (S23), 199–207 (1989). doi:10.1002/qua.v36:23+. doi: 10.1002/qua.560360824
Web of Science ®Google Scholar
C.L. Janssen and I.M.B. Nielsen, Chem. Phys. Lett. 290 (4), 423–430 (1998). doi:10.1016/S0009-2614(98)00504-1.
Web of Science ®Google Scholar
W. Jiang, N.J. DeYonker and A.K. Wilson, J. Chem. Theor. Comp. 8 (2), 460–468 (2012). doi:10.1021/ct2006852 PMID: 26596596.
PubMed Web of Science ®Google Scholar
O. Christiansen, H. Koch and P. Jørgensen, Chem. Phys. Lett. 243, 409–418 (1995). doi:10.1016/0009-2614(95)00841-Q.
Web of Science ®Google Scholar
T.B. Pedersen, B. Fernández and H. Koch, J. Chem. Phys. 114, 6983 (2001). doi:10.1063/1.1358866.
Web of Science ®Google Scholar

Appendix. Proofs of main results

In this appendix all our proofs have been collected and are given in the same order as the corresponding results have appeared above. We will only present a partial proof of Theorem 2.1, as the proofs of points (1) and (2) are standard, and can be found in, e.g. Ref. [Citation22].Proof

Proof of Theorem 2.1

For point (3), we note that F is locally Lipschitz continuous as a consequence of

E

being smooth, which together with strong monotonicity makes points (1) and (2) applicable. The remaining argument follows [Citation23] closely (where the case

E = E_{ECC}

was treated). First, by assumption of R,

F (Z_{*}) = 0

and

F_{d} (Z_{* d}) = 0

are equivalent to

\partial E (Z_{*}) = 0

and

\partial E_{d} (Z_{* d}) = 0

, respectively (note that R commutes with

P_{d}

). Now, Taylor expanding

E

around

Z_{*}

and evaluating at

Z_{* d}

gives

E (Z_{* d}) - E_{*} = \frac{1}{2} ⟨ Z, \partial^{2} E (Z_{*}) Z ⟩ + O (‖ Z ‖^{3}) .

By the smoothness of

E

, there exists a constant

C^{'}

such that

⟨ Z, \partial^{2} E (Z_{*}) Z ⟩ \leq C^{'} ‖ Z ‖^{2} .

Further, the fact that on U we can control the higher-order terms by the quadratic one, we have

| E (Z_{* d}) - E_{*} | \leq C ‖ Z_{* d} - Z_{*} ‖^{2} .

Using Equation (Equation8) gives the full statement in Equation (Equation9)

Proof

Proof of Theorem 3.1

With $F : V \oplus V \to V^{'} \oplus V^{'}$ the flipped gradient and $X \in V \oplus V$ , we obtain ${⟨ X, F (Z) ⟩}_{V \oplus V, V^{'} \oplus V^{'}} = {⟨ RX, \partial E (Z) ⟩}_{V \oplus V, V^{'} \oplus V^{'}} .$ In the sequel, we omit the specification of the spaces in the dual pairing.

Let $Z_{*} \in V \oplus V$ be such that $\partial E (Z_{*}) = 0$ , i.e. $F (Z_{*}) = 0$ . Since F is smooth, local strong monotonicity of F is equivalent to $\partial F (Z_{*}) \in B (V \oplus V, V^{'} \oplus V^{'})$ (the set of bounded linear operators) being coercive, i.e. there exists an $η_{*} > 0$ such that $Δ (X) = ⟨ X, ∂F (Z_{*}) X ⟩ \geq η_{*} ‖ X ‖_{V \oplus V}^{2},$ where $Δ (X)$ is defined by the first equality. (The constant η in Equation (Equation6(6) $⟨ F (Z_{1}) - F (Z_{2}), Z_{1} - Z_{2} ⟩ \geq η ‖ Z_{1} - Z_{2} ‖^{2},$ (6) ) approaches $η_{*}$ as the ball U given in Theorem 2.1 of the local strong monotonicity approaches a point.) To see this, we find an expression for $Δ (h)$ in terms of the energy map, (A1) $⟨ X, F (Z_{*} + ϵX) ⟩ = ⟨ X, F (Z_{*}) + ϵ F^{'} (Z_{*}; X) ⟩ + O (ϵ^{2}) .$ (A1) Here $F^{'} (Z_{*}; X)$ is the directional derivative in the direction of X such that $\begin{aligned} ⟨ X, ∂F (Z_{*}) X ⟩ & = ⟨ X, F^{'} (Z_{*}; X) ⟩ \\ = \frac{d}{d ϵ} ⟨ RX, \partial E (Z_{*} + ϵX) ⟩ |_{ϵ = 0} \\ = ⟨ RX, \partial^{2} E (Z_{*}) X ⟩, \end{aligned}$ where $\partial^{2} E (Z_{*}) \in B (V \oplus V, V^{'} \oplus V^{'})$ . By choosing ε small enough, Equation (EquationA1(A1) $⟨ X, F (Z_{*} + ϵX) ⟩ = ⟨ X, F (Z_{*}) + ϵ F^{'} (Z_{*}; X) ⟩ + O (ϵ^{2}) .$ (A1) ) and strong monotonicity gives the coercivity claim. The logical implication also goes in the reverse direction. (This will be used below.)

Recall that $E_{θ} = E \circ θ$ and that $F_{θ}$ denotes the flipped gradient of $E_{θ}$ . We use that $F_{θ}$ is locally strongly monotone at $W_{*} = θ^{- 1} (Z_{*})$ if and only if $Δ_{θ}$ is coercive, i.e. $Δ_{θ} (X) = ⟨ X, \partial F_{θ} (W_{*}) X ⟩ \geq η_{θ} ‖ X ‖^{2},$ for some $η_{θ} > 0$ and all $X \in V \oplus V$ . A straightforward application of the chain rule now gives $\begin{aligned} Δ_{θ} (X) & = ⟨ M_{*} RX, \partial^{2} E (Z_{*}) (M_{*} X) ⟩, \\ M_{*} & = ∂θ (W_{*}) \in B (V \oplus V, V \oplus V) . \end{aligned}$

We note that this is almost $Δ (M_{*} X)$ . Indeed, (A2) $\begin{aligned} Δ_{θ} (X) & = ⟨ R M_{*} X, \partial^{2} E (Z_{*}) (M_{*} X) ⟩ \\ + ⟨ [M_{*}, R] X, \partial^{2} E (Z_{*}) (M_{*} X) ⟩ \\ = Δ (M_{*} X) + ⟨ (M_{*} R - R M_{*}) X, \partial^{2} E (Z_{*}) (M_{*} X) ⟩ . \end{aligned}$ (A2) In particular, if $M_{*} R = R_{*} M$ then the last term vanishes in the utmost right-hand side of Equation (EquationA2(A2) $\begin{aligned} Δ_{θ} (X) & = ⟨ R M_{*} X, \partial^{2} E (Z_{*}) (M_{*} X) ⟩ \\ + ⟨ [M_{*}, R] X, \partial^{2} E (Z_{*}) (M_{*} X) ⟩ \\ = Δ (M_{*} X) + ⟨ (M_{*} R - R M_{*}) X, \partial^{2} E (Z_{*}) (M_{*} X) ⟩ . \end{aligned}$ (A2) ), and we obtain monotonicity of $F_{θ}$ but with a modified constant.

In the case where $M_{*} R \neq R M_{*}$ , we write $M_{*} = I + m_{*}$ , and note that $M_{*} R - R M_{*} = m_{*} R - R m_{*}$ . We obtain, (A3) $\begin{aligned} Δ_{θ} (X) & \geq η ‖ M_{*} X ‖^{2} - ‖ \partial^{2} E (Z_{*}) ‖ ‖ M_{*} ‖ ‖ m_{*} ‖ ‖ X ‖^{2} \\ \geq [η ‖ M_{*}^{- 1} ‖^{- 2} - C ‖ (1 + ‖ m_{*} ‖) ‖ m_{*} ‖] ‖ X ‖^{2} . \end{aligned}$ (A3) Here, we used that θ has a smooth inverse, implying $‖ M_{*} X ‖ \geq ‖ M_{*}^{- 1} ‖^{- 1} ‖ X ‖$ , and that $‖ M_{*} ‖ \leq I + ‖ m_{*} ‖$ .

Proof

Proof of Corollary 3.2

We consider the Jacobian of the coordinate map, which on block form reads (A4) $\partial θ (T^{'}, Λ^{'}) = (\begin{matrix} \frac{\partial T}{\partial T^{'}} & \frac{\partial T}{\partial Λ^{'}} \\ \frac{∂Λ}{\partial T^{'}} & \frac{∂Λ}{\partial Λ^{'}} \end{matrix}) .$ (A4) For the map $θ_{n}$ (see Equation (Equation17(17) $(T, Λ) = θ_{n} (T^{'}, Λ^{'}) = (T^{'}, G_{n} (Λ^{'})),$ (17) )), we first observe that by definition, $e^{Λ} = e^{Λ^{'}} + O (‖ Λ^{'} ‖^{n + 1}),$ from which it follows, by taking the logarithm and expanding the logarithm around $e^{Λ^{'}}$ , which is a finite Taylor series, $Λ = Λ^{'} + O (‖ Λ^{'} ‖^{n + 1}) .$ We obtain (A5) $\partial θ_{n} (T^{'}, Λ^{'}) = (\begin{matrix} I & 0 \\ 0 & I + O (‖ Λ^{'} ‖^{n + 1}) \end{matrix}) .$ (A5) For the map $θ_{n} \circ θ_{C - ECC}$ we have, using the chain rule, (A6) $\partial θ_{C - ECC} (T^{'}, Λ^{'}) = (\begin{matrix} A (Λ^{'}) & \partial_{Λ^{'}} A (Λ^{'}) T \\ 0 & I + O (‖ Λ^{'} ‖^{n + 1}) \end{matrix}) .$ (A6) Here, $A (Λ^{'})$ is the linear transformation on $V$ such that $S (T^{'}; Λ^{'}) = A (Λ^{'}) T^{'}$ (see Equation (Equation12(12) $QT ϕ_{0} = Q e^{- {Λ^{'}}^{†}} T^{'} ϕ_{0}, Q = I - | ϕ_{0} ⟩ ⟨ ϕ_{0} | .$ (12) )), i.e. $A (Λ^{'})$ can be expressed in terms of the matrix representation of $e^{- (Λ^{'})^{†}}$ , (A7) $A (Λ^{'}) T^{'} = \sum_{μ, ν \in I} X_{μ} ⟨ ϕ_{μ}, e^{- (Λ^{'})^{†}} ϕ_{ν} ⟩ ⟨ ϕ_{ν}, T^{'} ϕ_{0} ⟩ .$ (A7) We have $A (Λ^{'}) = I + O (‖ Λ^{'} ‖)$ , and $\partial_{Λ^{'}} A (Λ^{'}) T^{'} = O (‖ T^{'} ‖)$ . For both maps, the Jacobian of the coordinate transformation at the critical point $W_{*} = (T_{*}^{'}, Λ_{*}^{'})$ becomes $M_{*} = I + m_{*}$ with $m_{*} = O (‖ W_{*} ‖)$ . Applying Theorem 3.1(2), the local strong monotonicity follows, and by Theorem 2.1, quasi-optimality of the truncated solutions and a quadratic error estimate.

Guaranteed convergence for a class of coupled-cluster methods based on Arponen's extended theory

Abstract

GRAPHICAL ABSTRACT

1. Introduction