ABSTRACT
The Coupled-Cluster (CC) theory is one of the most successful high precision methods used to solve the stationary Schrödinger equation. In this article, we address the mathematical foundation of this theory with focus on the advances made in the past decade. Rather than solely relying on spectral gap assumptions (non-degeneracy of the ground state), we highlight the importance of coercivity assumptions – Gårding type inequalities – for the local uniqueness of the CC solution. Based on local strong monotonicity, different sufficient conditions for a local unique solution are suggested. One of the criteria assumes the relative smallness of the total cluster amplitudes (after possibly removing the single amplitudes) compared to the Gårding constants. In the extended CC theory the Lagrange multipliers are wave function parameters and, by means of the bivariational principle, we here derive a connection between the exact cluster amplitudes and the Lagrange multipliers. This relation might prove useful when determining the quality of a CC solution. Furthermore, the use of an Aubin–Nitsche duality type method in different CC approaches is discussed and contrasted with the bivariational principle.
GRAPHICAL ABSTRACT
1. Introduction
One of the most successful high accuracy ab initio computational schemes is the Coupled-Cluster (CC) approach [Citation1]. It goes back to Coester [Citation2], who in 1958 suggested using an exponential parametrisation of the wave function. This parametrisation was derived independently by Hubbard [Citation3] and Hugenholtz [Citation4] in 1957 as an alternative to summing many-body perturbation theory (MBPT) contributions order by order. At that time, Coester was not able to come up with working equations that one might try to solve. Those were presented by Čížek [Citation5] after the relevant concepts had been introduced in the context of quantum chemistry. In this work, Čížek mentioned the projective approach of the equations, which is exploited in all conventional CC methods until today. Firstly, in [Citation5] the working amplitudes and energy equations were derived when the cluster operator is approximated by merely double excitations (CCD). Secondly, the CC theory was compared with MBPT, configuration interaction (CI), and the pair cluster expansions of Sinanoğlu [Citation6]. Thirdly, the first ever CCD and linearised CCD computations were reported for nitrogen and a model of benzene. For a more detailed description of the CC history, we refer to reviews by pioneers of the theory. For example, Kümmel [Citation7] and Čížek [Citation8] wrote such articles within the workshop 'Coupled Cluster Theory of Electron Correlation'. Furthermore, see the articles by Bartlett [Citation9], Paldus [Citation10], Arponen [Citation11] and Bishop [Citation12].
Unlike the CI method, the CC formalism does not arise from the Rayleigh–Ritz variational principle and is therefore said to be non-variational in that sense. This yields the well-known fact that the CC energy is in general not equal to the expectation value of the Hamiltonian and in general not an upper bound to the ground-state energy. The reliability of quantum chemical methods is in most cases based on benchmarking, and the results' physical and chemical consistency with existing theory. The gold standard of quantum chemistry – the CCSD(T) method [Citation13,Citation14] – is no exception of this. It is the importance of sharp statements of an ab initio method's reliability that is the motivation of this work. Here, we build on a local analysis [Citation15] of the CC theory that also holds in the exact, so-called continuous, formulation with infinitely many one-particle basis functions [Citation16,Citation17].
There is a rich history of mathematical investigations addressing CC methods prior to the local analyses in [Citation15–17]. To give a complete historical account is beyond the scope of this article. We therefore limit ourselves and mention only a few important results. As a system of polynomial equations, the CC equations can have real or if the cluster operator is truncated, complex solutions. Furthermore, using quasi-Newton–Raphson methods to compute solutions of non-linear equations can lead to divergence since the approximated Jacobian may become singular. This is, in particular, the case when strongly correlated systems are considered. These and other related aspects of the CC theory have been addressed by Živković and Monkhorst [Citation18,Citation19] and Piecuch et al. [Citation20]. Significant advances in the understanding of the nature of multiple solutions of single-reference CC have been made by Živković and Monkhorst [Citation19], Kowalski and Jankowski [Citation21], and by Piecuch and Kowalski [Citation22]. An interesting attempt to address the existence of a cluster operator and cluster expansion in the open-shell case was done by Jeziorski and Paldus [Citation23]. We would also like to mention the coupled-electron pair approximation (CEPA) [Citation24–27]. This approach was introduced as a size-consistent alternative to the CISD method that was achieved by modifying (through topological factors [Citation28]) the CI equations to account for higher excitations. This makes CEPA non-variational (for an adapted variational formulation of CEPA see [Citation29]). CEPA can be regarded as an approximation of the CC method and does not form a truncation hierarchy that converges to the full-CI limit [Citation30].
Mathematical analysis is a well-established part of many natural sciences. Plenty examples show how various fields benefit from mathematical rigor and that mathematical analysis can define a framework of the method's applicability. This work takes off from recent developments of local analyses of CC methods, including the single-reference CC, the extended CC, the tailored CC (TCC) and its special case the CC method tailored by tensor network states (TNS–TCC) [Citation15–17,Citation31,Citation32]. In the spirit of Robert Parr's fundamental approach to quantum chemistry, which was honored during the 58th Sanibel Symposium, we here present some mathematical concepts used to analyse CC methods in a functional analytic framework. These yield rigorous analytical results that are independent of benchmarks and interpretations but rather based on mathematical assumptions. Adapting these assumptions to cover the computations performed in practice remains a challenge and is subject of future work. The local analysis puts as a sufficient – but not necessary – condition that the cluster amplitudes are small relative to other constants. We discuss a possible way out of this restriction motivated by the fact that CC calculations are known to work for large (single) amplitudes as well. We furthermore address the -diagnostic [Citation33] and mathematically derive a more sophisticated strategy that includes all cluster amplitudes and offers a sufficient condition of a locally unique and quasi-optimal solution (after possibly rotating out the single amplitudes) rather than rejection based on just large single amplitudes. We furthermore complement the literature by a detailed discussion on spectral gap assumptions. In this context, spectrum refers to the point spectrum, i.e. the eigenvalues of relevant operators. Although a gap between the highest occupied molecular orbital and the lowest unoccupied molecular orbital (HOMO–LUMO gap), or a spectral gap of the exact Hamiltonian (non-degenerate ground state), is crucial for the analysis, we highlight the importance of coercivity conditions, either for or the Fock operator . Additionally, we derive an optimal constant in the monotonicity proof of the CC function for the finite dimensional case, i.e. the projected CC theory. Comparing the CC Lagrangian with the extended CC formulation [Citation31], we propose by means of the bivariational principle an alternative to measure the quality of the Lagrange multipliers, here interpreted as wave function parameters.
This article is structured as follows: In Section 2, a brief summary of the CC theory is presented. We introduce the set of admissible wave functions and moreover define cluster operators, the CC function, and the CC energy (for a full scope treatment of the mathematical formulation of CC theory presented here we refer to [Citation16,Citation17]). In Section 3, we discuss the use of local analysis within different CC methods. Key concepts here are (see Section 3.1 for definitions) local strong monotonicity and local Lipschitz continuity of the CC function f, which – if fulfilled – are sufficient conditions for a locally unique solution of f=0 by Zarantonello's theorem. In particular, the importance of so-called Gårding inequalities is demonstrated. This is done both for the Hamiltonian, Section 3.2.1, and for the Fock operator, Section 3.2.2. We conclude in Section 3.3 with an overview of the Aubin-Nitsche method and the bivariational principle as they are used in CC methods for estimating the truncation error of the energy.
The authors are thankful to the organisers of the 58th Sanibel Symposium under which many ideas presented here took form. Moreover, the anonymous referee greatly improved a previous draft of this article – especially putting the local analysis, under consideration here, into the context of the rich quantum chemistry literature on CC methods. This work was supported by the European Research Council (ERC-STG-2014) through the Grant No. 639508, and furthermore supported by the Norwegian Research Council through the CoE Hylleraas Centre for Quantum Molecular Sciences Grant No. 262695. AL and FMF thank Simen Kvaal and Thomas Bondo Pedersen for useful comments and discussions.
2. Wave functions on an exponential manifold
The aim of electronic many-body methods, such as the CC approach, is to solve the electronic Schrödinger equation (SE) of an N-electron system. Here, is the ground-state energy and the self-adjoint Coulomb Hamiltonian. In this work, we restrict our attention to real Hamiltonians and wave functions. We emphasise that the mathematical framework of Hermitian operators is not sufficient to support the necessary spectral theory for quantum mechanics. Thirring exemplified this with the radial momentum operator on [Citation34].
From a mathematical viewpoint the Coulomb Hamiltonian, like most differential operators, is studied in its weak form to allow a larger variety of solutions. Set (or any other appropriate region in space and number of spin states) and let denote both integration and summation over spatial and spin degrees of freedom. Multiplying the SE on both sides with a smooth and compactly supported function , a so-called test function, and integrating by parts yields () (1) (1) where denotes the Coulomb operator (containing both the Coulomb attraction and repulsion) and ψ a solution of the SE. It follows immediately that the l.h.s. of Equation (Equation1(1) (1) ) defines a bilinear form with being the underlying algebraic field. Boundedness and ellipticity of this bilinear form, however, are non-trivial consequences that go back to Hardy–Rellich inequalities proving that is bounded (for a general introduction see [Citation35]). Note that this treatment of the SE extends the set of admissible wave functions to the set of antisymmetric -functions ψ of finite kinetic energy , i.e. and We denote this space and impose the norm In this topology is dense. Hence, the bilinear form is continuously extendable to . We define the operator where is dual space of , which we shall denote from now on. Note that maps indeed into since boundedness and ellipticity are preserved under continuous extensions. Furthermore, the r.h.s. of Equation (Equation1(1) (1) ) can be generalised to the dual pairing allowing to reformulate the SE as an operator equation: Find such that , with being the Riesz representation of ψ. This general approach to the SE was to the best of our knowledge not considered in the mathematical analyses of CC theory prior to the work of Schneider and Rohwedder [Citation15–17]. Subsequently, we consider this weak formulation and for simplicity write .
Different parameterisations of ψ lead to different approximation schemes, subject of this article is the CC scheme, i.e. we parameterise ψ on an exponential manifold. We assume that the solution can be written , where is a reference determinant of N one-electron functions and is an element of , the -orthogonal complement of . We denote the -inner product by and follow the quantum chemistry notation for expectation values of operators, i.e. . In particular, assuming that supports a ground state, which is always the case for Coulomb systems [Citation36], the Rayleigh–Ritz variational principle reads with . Note that although we assume ψ to be normalisable (-summable), we do not impose , but rather . Furthermore, by construction of the solution , we assume intermediate normalisation .
Next, let be an -orthonormal one-electron basis of the space of admissible one-electron wave functions. Unless we explicitly write we refer to the infinite dimensional setting. We construct from this set an -orthonormal Slater basis in the usual fashion denoted . Note that the N-particle basis functions span the infinite dimensional space of all possible excitations with respect to the reference determinant . In this notation we have , where are the -weights of ψ in the given Slater basis, i.e. . We formally define the cluster operators by , where excites the reference state to the state . We obtain with denoting the identity operator. The coefficients are called cluster amplitudes and we say that is a set of admissible cluster amplitudes if and . Due to the one-to-one relationship between cluster amplitudes and linearly parametrised wave functions, a natural choice for a norm on the space of admissible cluster amplitudes is the corresponding wave function norm of [Citation15,Citation16], i.e.
2.1. The exponential ansatz
The CC theory is based on an exponential parametrisation of wave functions. This is an alternative and, assuming full excitation rank (explained below) of the cluster operators, equivalent description of the full CI (FCI) wave function. Since its introduction by Hubbard [Citation3] and, independently, Hugenholtz [Citation4], the unique parametrisation of a wave function ψ by the exponential was assumed to be true and motivated from formal manipulations. However, the unique representation of functions in a Hilbert space is by nature a mathematical problem and was rigorously proven for the exponential parametrisation in the infinite dimensional case by Rohwedder [Citation16].
A key element in deriving the exponential parameterisation from the mathematical viewpoint is the well-definedness of the exponential of (or equivalently the logarithm of ), which is subject of functional calculus. We emphasise that the applicability of functional calculus depends strongly on the operator's domain since different domains may imply different properties of the operator, e.g. boundedness, essential self-adjointness, sectorial spectrum, etc. By the fact that Rohwedder [Citation16] showed the -continuity of cluster operators in a continuous setting, the functional calculus for bounded operators was proven to be applicable.
In the finite dimensional case this result was known in the quantum chemistry community, see, e.g. Živković and Monkhorst [Citation19]. However, this result was revisited by Schneider [Citation15] using the Cauchy–Dunford calculus. To the best of our knowledge, the subtleties addressed in [Citation15,Citation16] have not been part of previous considerations in mathematical analysis of CC theory. These important results demonstrate how quantum chemistry benefits from mathematics on a very fundamental level. The continuous CC theory amounts to the exact formulation where the set forms a basis (in the strict mathematical sense) of the one particle space . In a for this article appropriate form, we recall Rohwedder's result [Citation16]:
(i) Let denote a reference determinant, e.g. the Hartree–Fock solution. Given a wave function , i.e. , set where and note that , i.e. a bounded linear operator from into . Then, if and only if . Furthermore, there exists a constant C independent of such that An equivalent statement holds for the -adjoint of S.
(ii) The exponential map is a isomorphism between and . In particular, for any with there exists a unique such that .
Note that this result holds for any orthonormal set of N-particle basis functions spanning the space of selected excitations with respect to the reference determinant . However, it is required that the excitation rank of the cluster operators remains untruncated, i.e. , where corresponds to single excitations, to double excitations,…, to N-fold excitations. Consequently, we have (2) (2) in the case of full excitation rank.
The usual identification between the linear and exponential parametrisation holds [Citation37]: Write and suppose that the linear parametrisation is given by (3) (3) Expanding the exponential in Equation (Equation2(2) (2) ), and comparing with Equation (Equation3(3) (3) ), then yields and for the amplitudes where is the FCI coefficient of the reference determinant (here ). This shows a one-to-one relation for untruncated linear and exponential parameterisations. Restricting the parametrisation on the sub-manifold of excitation rank k<N, this one-to-one relationship is in general not true (see Remark 2 in [Citation31]): Consider CCSD for N>2 particles, i.e. . Expanding the exponential yields which is not a CISD parametrisation, unless for the trivial case .
2.2. The CC energy
Being able to express any wave function in on an exponential manifold, it is straightforward to derive the linked CC equations [Citation37]: (4) (4) Here, and all the (visavi ) excited determinants are assumed to form a basis of the anti-symmetric part of . Note that the above equation defines the CC function f and the CC energy function . Theorem 5.3 from [Citation16] demonstrates that the CC theory provides a wave function that satisfies :
The continuous (and with full excitation rank) CC amplitudes solve fulfilling if and only if the corresponding function solves the SE .
By this fact and with , if solves the SE yields Hence, the CC amplitudes describe a function that provides the system's energy in the usual quantum mechanical setting, i.e. .
In practice, computations are carried out using a finite basis and furthermore with a truncated excitation rank , n<N. The total truncation level can then be denoted by , and where we solve f=0 on to obtain , . We note the following from the literature:
(i) Given a finite one-electron basis , we denote the span of the corresponding Slater basis by . With full excitation rank (n=N) Proposition 4.7 in [Citation15] gives: and if solves the SE on , i.e. . By the argument of Monkhorst in [Citation38] we can establish the reverse: Assume and set , then since we obtain (Equation (38) in [Citation38]) From this we can conclude , i.e. the CC wave function gives the energy when inserted into the Rayleigh–Ritz quotient. Furthermore, we have (where denotes the truncated version of ) by the equivalence between linear and exponential parametrisation as long as full excitation rank is kept. Consequently, solves the SE on , which establishes the reversed implication in Proposition 4.7 in [Citation15].
(ii) However, for n<N we have in general (see for instance Remark 4.9 in [Citation15]) which gives the well-known result that the computed is not an upper bound to . Hence, where and .
(iii) By (ii), strictly speaking, CC methods do not compute wave functions, as does not provide the system's energy and therewith does not fulfill the Copenhagen interpretation's first principle [Citation39]. However, as mathematical analyses in [Citation15–17,Citation31,Citation32] have demonstrated, CC methods do provide approximate wave functions that converge to the solution of the SE (as , ). The Copenhagen interpretation is formulated for full systems, which correspond to the continuous CC formulation, and does not contain any statement about approximative solutions. This raises the fundamental question of what properties should be demanded for approximative solutions.
(iv) To contrast with the next section, we would also like to point out the work [Citation19] where, for a finite basis, the CC equations were analysed in a perturbational setting. Writing where we followed the notation in [Citation19] (see Equations (A9) and (A10)), the CI equations are obtained at and corresponds to the CC case. From this and under the assumption of a finite one-electron basis, both the reality and multiplicity of the CC solutions were investigated with respect to pole and branch cut singularities in the complex plane. The emergence of multiple solutions is certainly interesting and worth pursuing, however, the local analysis studied here instead deals with the establishment of a locally unique solution under certain assumptions. Note that the local behaviour of a solution is important for the applicability and convergence of Newton–Rhapson and quasi-Newton methods.
3. Local analysis in CC theory
The CC equations (linked and unlinked) can be formulated as a non-linear Galerkin scheme, which is a well-established framework in numerical analysis to convert the continuous Schrödinger equation to a discrete problem. Instead of solving the full problem, Galerkin methods solve the CC equations in a finite dimensional subspace . Note that the CC equations remain the same, only the space spanned by the considered has changed. Reducing the problem to a finite-dimensional vector subspace allows to numerically compute an approximate solution via Newton–Rhapson or quasi-Newton methods. Galerkin methods allow a local analysis, which is useful for CC theory due to the manifold of solutions [Citation18–23] and the use of quasi-Newton methods that require certain local behaviour of the solutions. Local analysis furthermore allows reliable statements about the existence and local uniqueness of Galerkin solutions as well as quantitative statements on the basis-truncation error. Its backbone is formed by a local version of Zarantonello's theorem [Citation40]:
Let be a map between a Hilbert space and its dual , and let be a root, , where is an open ball of radius δ around . Assume that f is Lipschitz continuous and strongly monotone in with constants L>0 and , respectively. Then the root is unique in . Indeed, there is a ball with such that the solution map exists and is Lipschitz continuous, implying that the equation has a unique solution , depending continuously on y, with norm . Moreover, let be a closed subspace such that can be approximated sufficiently well, i.e. the distance is small. Then, the projected problem has a unique solution and i.e. is a quasi-optimal solution.
The concept of quasi optimality was introduced in Jean Céa's dissertation [Citation41] in 1964 for linear Galerkin schemes and got extended over the years to the non-linear case. It ensures that the Galerkin solution in a fixed approximative space is, up to a multiplicative constant, the closest element to the exact solution. For obvious reasons this is a desired property for CC schemes. The different CC methods vary, however, in more than just minor details, which makes this property a conceptual different and challenging task to establish for each method.
3.1. Local unique solutions and quasi-optimality
We start by elaborating on the assumptions of Zarantonello's theorem in a more demonstrative way. Here, the notation is used for sequences and . In the context of the CC theory, the CC function f from Equation (Equation4(4) (4) ) is said to be strongly monotone if for sets of cluster amplitudes and there exists a such that (5) (5) If this inequality is true for all then f is said to be locally strongly monotone. The CC function f is further said to be Lipschitz continuous if there exists a constant L>0 such that (6) (6) In direct analogy with local strong monotonicity, we define local Lipschitz continuity if Equation (Equation6(6) (6) ) is fulfilled for all cluster amplitudes inside some ball.
To exemplify these concepts in a simple way we consider a smooth function . By the Cauchy–Schwarz inequality, the strong monotonicity implies that the derivative , i.e. f is a strictly monotonically increasing function. Note that strictly monotone functions are injective (one-to-one), which implies local invertibility. Hence, this already ensures local uniqueness of the function's root , if supported. Lipschitz continuity on the other hand implies that . Hence, the assumptions in Zarantonello's theorem are restrictions to the function's slope, namely By introducing normed operator spaces, these restrictions can be generalised to vector valued and even infinite dimensional functions f.
Returning to the general case, the Lipschitz continuity is key to derive the quasi-optimality in case of Galerkin solutions. We assume that is the considered approximation space supporting the Galerkin solution , i.e. for all . Then, , i.e. for all , in particular for . Starting from the strong monotonicity, we deduce for any that Because was chosen arbitrarily, the above estimate holds for all u, which implies the quasi optimality: (7) (7)
To apply Zarantonello's theorem to CC methods, the main challenge is to demonstrate a strictly positive γ in Equation (Equation5(5) (5) ) such that strong monotonicity holds locally around the solution that corresponds to the ground state. The original idea in [Citation15] to obtain such a result in the finite-dimensional projected CC theory assumed the existence of an HOMO–LUMO gap. Further, more technical assumptions on the Fock operator (see Gårding inequality below) were needed to achieve a generalisation to the continuous CC setting [Citation17], which also has a counterpart for . We refer the reader to [Citation15–17,Citation31,Citation32] for the detailed proofs and made assumptions, not only within the traditional CC formalism, but also for the TCC and extended CC methods. However, we remark that these assumptions are sufficient conditions but not necessary. One example is given by metals: Despite their typically small or negligible HOMO–LUMO gaps, the single-reference CC method can compute metallic effects often quite well. This suggests that the HOMO–LOMO gap assumption, which limits the results' applicability, can be lifted in the case of non-multi-configuration systems [Citation32]. See also [Citation23] for a CC theory that considers open-shell systems where no HOMO–LUMO gap exists.
Here, we extend the results in [Citation15–17,Citation31,Citation32] by optimising the strong monotonicity constant γ, which yields lesser restrictions on the solution's cluster amplitudes . Further investigations need to be undertaken before the presented analysis can lead to practical results of the reliability of the CC approach. However, we suggest an estimate on the CC amplitudes that is sufficient to guarantee the existence of a locally unique CC solution (see Equation (Equation13(13) (13) )) and contrast it with the single amplitudes diagnostic of [Citation33].
3.2. Local strong monotonicity of the CC function
In the literature there are two different proofs that the infinite dimensional (continuous) CC function f is locally strongly monotone [Citation17] (see also [Citation31] for the extended CC function). Even though spectral-gap assumptions enter the arguments, it is the so-called Gårding constants that give a sufficient condition for the local strong monotonicity, as will be demonstrated below. This fact emerges from the analysis in [Citation17] but was noted and elaborated within the analysis of the extended CC method in [Citation31]. We here furthermore improve the existing analysis by optimising the constants. We start by defining the Gårding inequality that will be used extensively in the sequel:
An operator fulfills a Gårding inequality if there exists a real constant e such that is coercive, i.e. there exists a constant c>0 that depends on e (we denote this dependence by ) such that The coercivity above describes a particular growth behaviour of as the lower bound becomes large when the wave function is at the extreme of the space, e.g. wave functions with a large kinetic energy. Subsequently, we denote the l.h.s. of Equation (Equation5(5) (5) ) by Δ, i.e. for two sets of CC amplitudes and we have We further set , which yields by the CC equations in Equation (Equation4(4) (4) ) the equality (8) (8) Next, we elaborate on Gårding inequalities for two different operators that imply local strong monotonicity of the CC function, by bounding the r.h.s. of Equation (Equation8(8) (8) ). Interestingly, for the finite-dimensional (projected) CC method, only the latter approach has a counterpart (using the particular structure of the Fock operator ).
3.2.1. A Gårding inequality for the hamiltonian
We here assume a spectral gap of , i.e. for all ψ that are -orthogonal to the ground state we have , for some , i.e. we assume a non-degenerate ground state. We also assume that is a good approximation of the exact wave function, i.e. is small. It then holds (see Lemma 11 in [Citation31]) (9) (9) with . Thus, is close to and strictly positive, if ϵ is sufficiently close to zero. Using the argument in [Citation17,Citation31] (see proof of Theorem 3.4 in [Citation17], and also Equation (16) with together with the proof of Theorem 16 in [Citation31]), we obtain (10) (10) In [Citation17], the first term of Equation (Equation10(10) (10) ) was bounded by a constant times , achieved by combining the Gårding inequality with Equation (Equation9(9) (9) ).
From Lemma 11 in [Citation31], it follows that However, this can be further strengthened to with the optimal constant From this we conclude (11) (11) which yields the following sufficient condition for the local strong monotonicity of f, namely (12) (12) Given , we observe that a sufficiently small ϵ and , such that is small enough relative to , guarantees that Equation (Equation12(12) (12) ) is fulfilled. (Recall that and are equivalent, see Section 2.1.)
To finalise this section, we offer the following interpretation of Equation (Equation11(11) (11) ), providing a more descriptive approach to Equation (Equation12(12) (12) ). We see as e tends to from above, the quotient goes to one from below. Furthermore, assume that goes to zero from above as e approaches from above. This suggest an optimal value of . For instance, choosing implies such that is eliminated from the expression. Assuming further that corresponds to an yields . In conclusion, as long as , the Gårding constant offers a direct estimate of the monotonicity constant . We therefore obtain the following (approximate) sufficient condition for local strong monotonicity (13) (13) Note that , for some constant K. However, a sharp estimate for this constant is object of current research. Thus, for Zarantonello's theorem to guarantee a locally unique solution, the exact amplitudes cannot be too large relative to . We remark that by an appropriate choice of the reference determinant , the single amplitudes do not contribute to (the overall) . Thus, if is too large then this is a consequence of (doubles, triples, etc.). Numerical investigations are left for future work but we can already compare this mathematically derived sufficient condition for locally unique CC solutions with the -diagnostics of [Citation33]. Given the truncation level n of the excitation rank, here the proposed diagnostic uses all cluster amplitudes and not just the single amplitudes . This is a clear advantage since, as mentioned above, orbital rotations can be used to rotate out the single amplitudes. However, our diagnostic offers only a sufficient and not a necessary criterion for a local unique solution, i.e. for large the current diagnostic is agnostic about local uniqueness and only states that local strong monotonicity cannot be inferred from this particular analysis. We hope that future work will clarify the situation further.
3.2.2. A Gårding inequality for the fock operator
On the other hand, assume an HOMO–LUMO gap of the Fock operator and that is the Hartree–Fock solution, i.e. with The HOMO–LUMO gap thus corresponds to a spectral gap of the Fock operator and we regard as the ground-state energy of . Let and choose as eigenbasis of , i.e. for all k. We observe that , and with . The argument proving that the CC function f is locally strongly monotone can then be outlined as follows.
The considered Fock operator is assumed to fulfill a Gårding inequality. Thus there exists a constant e such that is coercive, i.e. For the sake of simplicity we use the same symbols for the Gårding constants of as for the Hamiltonian. In complete analogy with , the argument in [Citation15,Citation31] shows that (14) (14) and we moreover define (15) (15) Following [Citation17], for a fixed we define the map from the space of cluster amplitudes into the space of wave functions , with . Hence, (16) (16) where , and assume that for some L>0 (not too large) (17) (17) As a technical remark, the assumption in [Citation17] is the stronger requirement that is Lipschitz continuous as a map from the space of cluster amplitudes to . However, we here note that Equation (Equation17(17) (17) ) is sufficient to derive the CC function's local strong monotonicity, as will be evident shortly. Inserting the identity (a consequence of Equation (Equation16(16) (16) ) and ) into Equation (Equation8(8) (8) ), as well as using Equations (Equation14(14) (14) ) and (Equation17(17) (17) ), we obtain (18) (18) Consequently, local strong monotonicity holds if . Repeating the argument presented in the previous section, with the obvious adaptations, we obtain (19) (19) as a sufficient condition for f to be locally strongly monotone. Here, no explicit assumption on enters. The main drawback of the assumption in Equation (Equation19(19) (19) ) is that the constant L of the inequality in Equation (Equation17(17) (17) ) has to be determined. Further analysis of this constant is postponed for later work.
Before we conclude this section we exemplify how the Gårding constant c can be chosen in the finite dimensional setting. In this case the commutator is an excitation operator (which implies ) and is simply the similarity transformation of the fluctuation potential . This offers the following insight into the optimal constant in Equation (Equation14(14) (14) ) for the truncated case. As in [Citation15], we define the norm on by It follows that Using instead of and making the assumption in Equation (Equation17(17) (17) ) also for the truncated theory (denoting the Lipschitz constant in this new topology by ), we obtain (20) (20) Comparing the local strong monotonicity estimates Equations (Equation18(18) (18) ) and (Equation20(20) (20) ) suggests that the finite-dimensional version of equals one. Furthermore, at first glance it appears that the estimate in Equation (Equation20(20) (20) ) is obtained without imposing a Gårding inequality. A key observation here is that the choice of the norm makes on fulfill a Gårding inequality with and . Indeed, the inequality is saturated, meaning that equality holds. It follows then immediately from Equation (Equation15(15) (15) ) that . Thus, in agreement with Equation (Equation19(19) (19) ) we have obtained the condition .
To conclude this section, we note that we have formulated an alternative to the diagnostic in Equation (Equation13(13) (13) ): Assume a finite one-electron basis and suppose that satisfies Equation (Equation17(17) (17) ) with the norm and locally around the solution amplitudes. Then local strong monotonicity implies a locally unique CC solution. Whether Equation (Equation17(17) (17) ) with holds without the assumption of a small is an interesting and still open question. Furthermore, the above analysis can be generalised to any single particle operator fulfilling certain properties (see [Citation15,Citation32]).
3.3. The CC method's numerical analysis
As computational schemes, the convergence behaviour of CC methods is one of the main objects of study. This covers whether or not the method converges towards the exact solution as well as how fast it converges. We note that the quasi optimality as given in Equation (Equation7(7) (7) ) yields as (for increasing approximation spaces ). Furthermore, in the case of the CC method one studies the CC-energy residual A major difference between the CI and CC method is that the CC formalism is not variational in the Rayleigh–Ritz sense. Consequently, it is not evident that the CC energy error decays quadratically with respect to the error of the wave function or cluster amplitudes. In the sequel we present two approaches that were used in previous mathematical analyses of different CC methods to derive such quadratic error bounds [Citation17,Citation31,Citation32].
3.3.1. The Aubin–Nitsche duality method
The Aubin–Nitsche duality method is a standard tool for deriving a priori error estimates for finite element methods. It was introduced independently by Aubin [Citation42], Nitsche [Citation43] and Oganesyan–Ruchovets [Citation44]. We here elaborate the Aubin–Nitsche duality type method used in [Citation15,Citation17,Citation32] to derive a quadratic error bound for the CC method and the closely related TNS-TCC method, a special case of the tailored CC method [Citation45]. This approach exploits the mathematical framework introduced by Bangerth and Rannacher [Citation46]. The untruncated Euler–Lagrange method gives the Lagrangian with f and from Equation (Equation4(4) (4) ). The corresponding Gâteaux derivative in direction is denoted and we study fulfilling (21) (21) for all pairs of CC amplitude vectors . Under the assumption that f is locally strongly monotone inside a ball around , there exists a unique determined by such that solves Equation (Equation21(21) (21) ). Note, that the assumptions imposed to ensure local strong monotonicity are different for the single-reference CC method [Citation15,Citation17] and the TNS-TCC method [Citation32]. Moreover, there exists a solution to the corresponding discretisation of the problem that approximates quasi optimally [Citation17,Citation32]. Equipped with these so called dual solutions, the energy-error characterisation given by Bangerth and Rannacher [Citation46] yields with arbitrarily chosen discrete CC amplitudes . The given remainder term is cubic in the primal and dual error, i.e. and . Using this energy-error characterisation, a quadratic energy-error bound for the single-reference CC method [Citation15,Citation17] and the TNS-TCC method [Citation32] follows.
3.3.2. The bivariational approach
The extended version of the CC method rests on Arponen's bivariational approach [Citation47,Citation48]. This unconventional formulation of the CC method parametrises two independent wave functions and thus makes use of two sets of cluster amplitudes and . It gained recent attention in the study [Citation31] and has a major advantage as far as the error analysis is concerned, namely, the energy itself is stationary, i.e. the solution is a critical point of the bivariational energy, see Equation (Equation22(22) (22) ). Consequently, when the corresponding Galerkin solution is close to the exact solution, a quadratic error estimate is guaranteed. Subsequently, we elaborate on this further.
Consider the Rayleigh–Ritz quotient, we write Hence, is a stationary point of , i.e. . By Taylor expanding around we obtain the quadratic error estimation for the Rayleigh–Ritz quotient As mentioned before, the CC formalism does not arise from the Rayleigh–Ritz variational principle. However, it can be described by Arponen's bivariational approach, as follows. Let the bivariate quotient be (22) (22) Equation (Equation22(22) (22) ) can be seen as a generalisation of the Rayleigh–Ritz quotient where a stationary point is given by a left and right eigenvector of with corresponding eigenvalue . Note that is no longer a below bounded functional, hence critical points do not necessarily correspond to extremal points as they do for . In the extended CC theory, the bivariational quotient is studied indirectly by means of the so-called flipped gradient [Citation31]. Following [Citation47], we assume and note that there exists such that (cf. Section 2.1). Then and consequently there exists a cluster operator so that . This defines a smooth coordinate map Φ from cluster amplitudes to wave functions . The flipped gradient is then given by , where we introduced the flipping map Under certain assumptions, is locally strongly monotone [Citation31]. By the extended CC approach [Citation31], and solve the SE if and only if . Note that implies and therewith a quadratic energy error.
Furthermore, by identifying we obtain from Equation (Equation22(22) (22) ) the CC Lagrangian, i.e. (23) (23) Introducing the Lagrangian is a general method for optimisation with constraints. In the special case of CC theory with fixed orbitals, as in this article, Equation (Equation23(23) (23) ) demonstrates the equivalence to Arponen's bivariational method [Citation47]. In the context of obtaining an efficient evaluation of CC energy gradient, the derivative of the variational functional was obtained by Bartlett [Citation49]. The functional itself (Equation (Equation23(23) (23) )) was first used in quantum chemistry by Helgaker and Jørgensen [Citation50] to derive CC energy derivatives. We would also like to mention the related extended CC work of Piecuch and Bartlett [Citation51]. Note that their assumption that the reference determinant is both a left- and right eigenvector of the doubly similarity transformed can be rigorously proven in the continuous case (see Lemma 13 in [Citation31]).
Denoting the dual solution as in Section 3.3.1, it can then be seen that also describes cluster amplitudes parameterising the wave function . Indeed, using the relation , we obtain that together with solve the SE corresponding to the same energy . Assuming non-degeneracy and using the constraint , we arrive at the condition for the primal and dual solutions and . Thus, from the extended CC theory we have obtained a constraint relating to for the traditional CC method.
4. Conclusion
In this article, we have introduced the reader to a local analysis of the CC method and its variations. In particular, we have demonstrated that the Gårding inequalities for and are key as far as a better understanding of the sufficient conditions for a locally unique and quasi-optimal solution of the CC equations is concerned. Moreover, these investigations are geared towards an a posteriori criterion of assessing the CC amplitudes from a given computation. This is a mathematical approach that is alternative to the controversial diagnostic suggested in [Citation33]. Indeed, the mathematically derived criteria in Equations (Equation12(12) (12) ) and (Equation13(13) (13) ) use the total and not just the single amplitudes . Since the single amplitudes could be removed by an appropriate choice of the reference determinant (i.e. an ideal choice of the basis functions), the sufficient condition for a locally unique solution given by Equation (Equation13(13) (13) ) puts constraints on the remaining amplitudes (). However, it is not yet a rejection criterion since it only implies locally unique and quasi-optimal solutions under certain conditions. As outlined, the upper bound in Equation (Equation13(13) (13) ) is fundamentally different from previous heuristic and potentially misleading diagnostics [Citation33] since the former is derived in a rigorous mathematical framework, where not just the singles amplitudes are taken into consideration. We have also shown that the condition on the two particle operator in Equation (Equation17(17) (17) ) implies a locally unique CC solution. Here, the condition does not explicitly depend on the amplitude norm and might offer a broader understanding of the reliability of a CC solution. Moreover, the derived condition is independent of the chosen single particle operator. In connection with the extended CC formalism, we have set up a constraint for the exact CC Lagrange multipliers , relating them to the exact CC amplitudes . Numerical investigations are left for future work.
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
A. Laestadius http://orcid.org/0000-0001-7391-0396
Additional information
Funding
References
- R.J. Bartlett and M. Musiał, Rev. Mod. Phys. 79 (1), 291 (2007). doi: 10.1103/RevModPhys.79.291
- F. Coester, Nucl. Phys. 7, 421–424 (1958). doi: 10.1016/0029-5582(58)90280-3
- J. Hubbard, Proc. R. Soc. Lond. A 240 (1223), 539–560 (1957). doi: 10.1098/rspa.1957.0106
- N. Hugenholtz, Physica 23 (1–5), 533–545 (1957). doi: 10.1016/S0031-8914(57)93009-4
- J. Čížek, J. Chem. Phys. 45 (11), 4256–4266 (1966). doi: 10.1063/1.1727484
- O. Sinanoğlu, J. Chem. Phys. 36 (3), 706–717 (1962). doi: 10.1063/1.1732596
- H. Kümmel, Theor. Chim. Acta. 80 (2–3), 81–89 (1991). doi: 10.1007/BF01119615
- J. Čížek, Theor. Chim. Acta. 80 (2–3), 91–94 (1991). doi: 10.1007/BF01119616
- R. Bartlett, in Theory and Applications of Computational Chemistry: The First Forty years, edited by C.E. Dykstra, G. Frenking, K.S. Kim, and G.E. Scuseria (2005), pp. 1191–1221.
- J. Paldus, Theory and Applications of Computational Chemistry (Elsevier, 2005), pp 115–147.
- J.S. Arponen, Theor. Chim. Acta. 80 (2–3), 149–179 (1991). doi: 10.1007/BF01119618
- R. Bishop, Theor. Chim. Acta. 80 (2–3), 95–148 (1991). doi: 10.1007/BF01119617
- K. Raghavachari, G.W. Trucks, J.A. Pople and M. Head-Gordon, Chem. Phys. Lett. 157 (6), 479–483 (1989). doi: 10.1016/S0009-2614(89)87395-6
- R.J. Bartlett, J. Watts, S. Kucharski and J. Noga, Chem. Phys. Lett. 165 (6), 513–522 (1990). doi: 10.1016/0009-2614(90)87031-L
- R. Schneider, Numerische Mathematik 113 (3), 433–471 (2009). doi: 10.1007/s00211-009-0237-3
- T. Rohwedder, ESAIM Math. Model. Numer. Anal. 47 (2), 421–447 (2013). doi: 10.1051/m2an/2012035
- T. Rohwedder and R. Schneider, ESAIM: Math. Model. Numer. Anal. 47 (6), 1553–1582 (2013). doi: 10.1051/m2an/2013075
- T.P. Živković, Int. J. Quantum. Chem. 12 (S11), 413–420 (1977). doi: 10.1002/qua.560120849
- T.P. Živković and H.J. Monkhorst, J. Math. Phys. 19 (5), 1007–1022 (1978). doi: 10.1063/1.523761
- P. Piecuch, S. Zarrabian, J. Paldus and J. Čížek, Phys. Rev. B 42 (6), 3351 (1990). doi: 10.1103/PhysRevB.42.3351
- K. Kowalski and K. Jankowski, Phys. Rev. Lett. 81 (6), 1195 (1998). doi: 10.1103/PhysRevLett.81.1195
- P. Piecuch and K. Kowalski, in Computational Chemistry: Reviews of Current Trends, edited by J. Leszczynski (World Scientific, Singapore, 2000), Vol. 5.
- B. Jeziorski and J. Paldus, J. Chem. Phys. 90 (5), 2714–2731 (1989). doi: 10.1063/1.455919
- W. Meyer, Int. J. Quantum. Chem. 5 (S5), 341–348 (1971). doi: 10.1002/qua.560050839
- W. Meyer, J. Chem. Phys. 58 (3), 1017–1035 (1973). doi: 10.1063/1.1679283
- R. Ahlrichs, F. Driessler, H. Lischka, V. Staemmler and W. Kutzelnigg, J. Chem. Phys. 62 (4), 1235–1247 (1975). doi: 10.1063/1.430638
- W. Kutzelnigg, Methods of Electronic Structure Theory (Plenum Press, New York, 1977), p.129.
- R. Ahlrichs, P. Scharf and C. Ehrhardt, J. Chem. Phys. 82 (2), 890–898 (1985). doi: 10.1063/1.448517
- C. Kollmar and F. Neese, Mol. Phys. 108 (19–20), 2449–2458 (2010). doi: 10.1080/00268976.2010.496743
- F. Wennmohs and F. Neese, Chem. Phys. 343 (2–3), 217–230 (2008). doi: 10.1016/j.chemphys.2007.07.001
- A. Laestadius and S. Kvaal, SIAM. J. Numer. Anal. 56 (2), 660–683 (2018). doi: 10.1137/17M1116611
- F.M. Faulstich, A. Laestadius, S. Kvaal, Ö. Legeza and R. Schneider, arXiv preprint arXiv:1802.05699 (2018).
- T.J. Lee and P.R. Taylor, Int. J. Quantum. Chem. 36 (S23), 199–207 (1989). doi: 10.1002/qua.560360824
- C.W. Kilmister and E. Schrödinger, Schrödinger: Centenary Celebration of a Polymath (Cambridge University Press, Cambridge, 1987).
- D. Yafaev, J. Funct. Anal. 168 (1), 121–144 (1999). doi: 10.1006/jfan.1999.3462
- H. Yserentant, Regularity and Approximability of Electronic Wave Functions (Springer, Berlin, 2010).
- T. Helgaker, P. Jorgensen, and J. Olsen, Molecular Electronic-Structure Theory (John Wiley & Sons, New York, 2014).
- H.J. Monkhorst, Int. J. Quantum. Chem. 12 (S11), 421–432 (1977). doi: 10.1002/qua.560120850
- W. Heisenberg, Die Kopenhager Deutung der Quantentheorie (Battenberg, Stuttgart, 1963).
- E. Zaidler, Nonlinear Functional Analysis and Its Applications (Springer, New York, 1990).
- J. Céa, Ann. Inst. Fourier (Grenoble) 14 (fasc. 2), 345–444 (1964). doi: 10.5802/aif.181
- J.P. Aubin, Annali della Scuola Normale Superiore di Pisa-Classe di Scienze 21 (4), 599–637 (1967).
- J. Nitsche, Numerische Mathematik 11 (4), 346–348 (1968). doi: 10.1007/BF02166687
- L.A. Oganesyan and L.A. Rukhovets, USSR Comput. Math. Math. Phys. 9 (5), 158–183 (1969). doi: 10.1016/0041-5553(69)90159-1
- T. Kinoshita, O. Hino and R.J. Bartlett, J. Chem. Phys. 123 (7), 074106 (2005). doi: 10.1063/1.2000251
- W. Bangerth and R. Rannacher, Adaptive Finite Element Methods for Differential Equations (Birkhäuser, Basel, 2013).
- J. Arponen, Ann. Phys. 151 (2), 311–382 (1983). doi: 10.1016/0003-4916(83)90284-1
- P.O. Löwdin, J. Math. Phys. 24 (1), 70–87 (1983). doi: 10.1063/1.525604
- R. Bartlett, in Geometrical Derivatives of Energy Surfaces and Molecular Properties, edited by P. Jorgensen and J. Simons (Reidel, Dordrecht, 1986).
- T. Helgaker and P. Jørgensen, in Advances in Quantum Chemistry, edited by Per-Olov Löwdin (Academic Press, Cambridge, MA, 1988), Vol. 19, pp. 183–245.
- P. Piecuch and R.J. Bartlett, in Advances in Quantum Chemistry (Academic Press, Cambridge, MA, 1999), Vol. 34, pp. 295–380.