Publication Cover
Molecular Physics
An International Journal at the Interface Between Chemistry and Physics
Volume 118, 2020 - Issue 5
4,585
Views
14
CrossRef citations to date
0
Altmetric
Research Articles

On derivatives of the energy with respect to total electron number and orbital occupation numbers. A critique of Janak's theorem

Article: e1612955 | Received 25 Jul 2018, Accepted 11 Apr 2019, Published online: 27 May 2019

Abstract

The relation between the derivative of the energy with respect to occupation number and the orbital energy, E/ni=ϵi, was first introduced by Slater for approximate total energy expressions such as Hartree–Fock and exchange-only LDA, and his derivation holds also for hybrid functionals. We argue that Janak's extension of this relation to (exact) Kohn–Sham density functional theory is not valid. The reason is the nonexistence of systems with noninteger electron number, and therefore of the derivative of the total energy with respect to electron number, E/N. How to handle the lack of a defined derivative E/N at the integer point, is demonstrated using the Lagrange multiplier technique to enforce constraints. The well-known straight-line behaviour of the energy as derived from statistical physical considerations [J.P. Perdew, R.G. Parr, M. Levy and J.L. Balduz, Phys. Rev. Lett. 49, 1691 (1982)] for the average energy of a molecule in a macroscopic sample (‘dilute gas’) as a function of average electron number is not a property of a single molecule at T=0. One may choose to represent the energy of a molecule in the nonphysical domain of noninteger densities by a straight-line functional, but the arbitrariness of this choice precludes the drawing of physical conclusions from it.

GRAPHICAL ABSTRACT

1. Introduction

Fractional occupation numbers have remained popular ever since Slater introduced his expressions of the Hartree–Fock (HF) energy [Citation1] and of the Xα energy (exchange-only LDA, XLDA) using explicit occupation numbers. His relation [Citation2] (1) Slater: Emodel/ni=ϵimodelmodel=HF,Xα,LDA,GGA,hybrid,(1) holds for the mentioned model total energies. Slater applied his derivation explicitly to Hartree–Fock and to exchange-only LDA (Xα), but his method only relies on the application of the chain rule to density and density matrix-dependent functionals, so is valid for all cited cases. Slater used relation (Equation1) to solve a problem that plagued his Xα scheme and all (semi-)local density functional approximations (LDFAs) since then: the orbital energy of the highest occupied orbital is not close to minus the ionisation energy I. In the case of HF, it can be shown that the HOMO orbital energy ϵHHF is close to I, according to Koopmans' frozen orbital approximation for the ion, ϵHHF=IHF(froz.orb.). The frozen orbital approximation leads to ϵHHF being a bit lower than the exact I (often about 1 eV). But for the local and semi-local density functional approximations (LDFAs) like LDA and GGAs, it is always some 4–6 eV too high. Slater introduced his so-called transition state (TS) method, which uses (Equation1), to obtain a good approximation to the ionisation energy as the HOMO orbital energy of a half-occupied orbital. See Ref. [Citation3] for a recent review of orbital energies and (fractional) occupation numbers.

The interest in fractional occupation numbers has stimulated their introduction in density functional theory (DFT). Janak's theorem [Citation4] (2) Janak: EnpKS=ϵpKS(2) expresses relation (Equation1) for the exact energy and Kohn–Sham orbital energies. The position of noninteger electron systems (implicit in (Equation2)) in straightforward DFT for single molecules at T=0 is actually remarkable if one recognises that fractional electrons do not exist. One expects that everything that is ‘explained’ or ‘proven’ with fractional electrons, can also be treated with integer electron numbers, cf. Ref. [Citation5]. In the case of molecular ionisation this is clear: On the one hand, Slater's relation (Equation1) only has meaning if occupation numbers can become fractional, and it can be used to prove Koopmans' theorem in HF using fractional electron numbers (see Section 5). It can also be used to calculate with the TS method the ionisation energies including orbital relaxation from orbital energies at fractional occupation number. On the other hand, Koopmans' theorem has first been formulated with just integer electron systems, and Slater's TS method just approximates total energy differences of integer electron systems. Slater's TS method is a mathematical device, it does not lay claim to any physical meaning of the fractional electron numbers. Janak's theorem in exact DFT, however, requires that physical meaning be given to (the energy of) noninteger electron systems (otherwise the derivative of Equation (Equation2) would not exist). The introduction of an ensemble description of noninteger electron systems by Perdew, Parr, Levy and Balduz (PPLB) [Citation6] is very well known, but it should be kept in mind that this is based on statistical mechanics (using the grand canonical ensemble) [Citation7]. It should be realised that the results hold for the average energy as a function of average electron number in, for instance, a macroscopic gas of molecules in the T0 limit, not for a single atom or molecule.

We argue in this paper that Janak's theorem is in fact not a valid theorem in exact DFT (meaning DFT for single finite electron systems at T=0). Orbital occupation numbers do not properly belong to the KS theory. The KS system of noninteracting electrons will have the KS determinant as wavefunction. Orbitals either occur in that determinant (are occupied) or not. In order for the derivative in (Equation2) to be defined, the exact energy corresponding to a small increase and decrease of some np (in a neighborhood of the integer occupation numbers) should be defined. It is not clear what that means in terms of the noninteracting electron system of the KS model (a KS orbital cannot ‘appear more’ or ‘appear less’ in the determinant). Increase of an orbital occupation above 1, would surely entail increase of the integrated density beyond the integer N -value. However, for such densities E is not defined in exact DFT, and therefore the derivative E/N is not defined, and neither would be the derivative with respect to orbital occupation E/ni.

We discuss in Section 2 the optimisation of the density according to the Hohenberg–Kohn theorem, where the constraint of fixed electron number N can, in principle, be treated with the Lagrange multiplier μ=E/N. The derivative E/N, however, is not defined in DFT due to the domain on which the Hohenberg–Kohn functional (and Levy–Lieb and Lieb functionals) are defined being limited to N-conserving densities. The standard choice in functional analysis of E[ρ]=+ [Citation8–11] for ρdrN is problematic for definition of E/N. However, the essential arbitrariness of E/N does not preclude the optimisation of the energy by density variation (at the integer point). It can be done either by constraining the density variation to normconserving ones, or by the Lagrange multiplier technique. The theory of constrained derivatives is discussed, in particular how the total functional derivative of the energy δE/δρ(r) can be split into the derivative under the constraint of constant N, written δE/δNρ(r), and the derivative with respect to particle number N, E/N. Only the former is in principle needed, but is difficult to obtain. Therefore, a constraint like ρdr=N is traditionally handled with the Lagrange multiplier technique. The latter, however, requires that E/N (which enters the formalism as the force of constraint) is known. This raises the question how this technique should be applied if E/N is not defined. Optimisation is possible, but requires that a suitable choice of the essentially arbitrary E/N should be made (suitable means finite and continuous in the neighbourhood of the integer N point).

In Section 3, the consequences of E/ni not being defined are discussed. On the basis of the theory of constrained functional derivatives discussed in Section 2, a critique of Janak's theorem is given. The conclusion is that Janak's theorem has no validity in (exact) Kohn–Sham DFT. It is also investigated whether a KS-like formalism can be set up with possibly fractional occupation numbers (summing up to N) by dropping the KS physical model of N noninteracting electrons in N one-electron states in a local potential. This is shown to revert to the KS model.

Even if systems with a noninteger electron number are not physical, one may still define them as auxiliary systems to obtain useful results for integer electron systems. We have already mentioned Slater's TS method which uses ionisation by half an electron in order to derive an ionisation energy (physical quantity) from an orbital energy of an unphysical half-occupied orbital. This method defines, by a specific introduction of (fractional) occupation numbers, the energy of the auxiliary system as a (nearly) quadratic function of the occupation number. A different introduction of (fractional) occupation numbers, is the one based on statistical mechanical considerations of macroscopic systems at a finite temperature or judiciously chosen T0 limit, as introduced by Mermin [Citation12] and Perdew et al. [Citation6, Citation7]. These occupation numbers describe the distribution of the electronic systems in the macroscopic sample over states with different occupations of the one-electron levels, see e.g. the derivation of the Fermi–Dirac distribution in [Citation13]. So they are physical and describe averages. The search over ensembles of integer electron density matrices of PPLB [Citation6] leads, in the T0 limit, to linear interpolation (straight-line behaviour) of the average energy and the average electron densities for fractional electron number.

If an analogous procedure is followed for a single finite quantum system (atom or molecule), using an ensemble of density matrices with different electron numbers, density optimisation will again result in a specific (‘straight’) path through density space and a straight-line energy as a function of fractional electron number, see Section 4.1. However, a single system with noninteger electron number is nonphysical. This represents just one out of many possibilities to choose the arbitrary (because nonphysical) E/N in this case. This choice should not be considered the only viable or even ‘exact’ definition [Citation14–16] of E[ρ] of a single quantum system (atom or molecule) for ρdrN. The nonexistence of a noninteger electron system makes it impossible to ever obtain a benchmark value for the energy (wavefunction) of such a system.

It has been argued that the straight-line behaviour of the energy can be derived for an atom or molecule from size-consistency arguments, see Section 4.2. We caution that the underlying assumption is that the functional be local, which is not the case for the exact functional. A straight-line energy functional yields dissociation of any diatomic molecule into neutral atoms [Citation6] (Section 4.3), which may be considered a strong point. However, we draw attention to the fact that this again relies on the unwarranted assumption of locality of the functional. Indeed, Section 4.4 discusses that the jump of the KS potential when an integer is passed, which is inherent in the straight-line functional, is a local property of a fragment in the dissociation and is not compatible with a correct description of dissociation of a heteronuclear diatomic.

Finally in Section 5, the possibility of a meaningful use of orbital occupation numbers in approximate energy expressions is highlighted. The freedom to introduce such additional variables in the total energy expressions is stressed, which may be done in Slater's way or in a different way. The Lagrange multiplier technique can be used to constrain the occupation numbers to physically meaningful values (typically 1 and 0). From this then emerges a physical meaning of the Lagrange multipliers {λi} for this constraint, namely they become equal to the Lagrange multipliers {ϵi} for the normalisation constraint, λi=ϵi. It is shown that relation (Equation1) only holds if a particular choice is made for the dependence of the total energy on occupation numbers. Suppose one uses a model energy that allows to identify a derivative with respect to an occupation number with the related orbital energy according to the Slater relation (Equation1), e.g. Emodel/nLUMO=ϵLUMO. One may equate the LUMO orbital energy, in this example, to (E/N)+, but the essential arbitrariness of the latter precludes any conclusion about the value of this orbital energy. Section 6 summarises.

We should stress that we are dealing here with systems in their ground state at T=0. The treatment of finite temperature effects and the statistical physics of large numbers of systems that can exchange energy (canonical ensembles) or also particles (grand canonical ensembles) with a bath are outside the scope of this treatment. The properties of individual quantum systems at T=0 are required in the first place, they determine the behaviour of ensembles. Individual systems can be in different states (excited, with different numbers of electrons) but the probabilities of occurrence of such states in statistical ensembles at finite temperatures is not the subject of Slater's or Janak's relations.

2. Energy derivatives and indeterminacy of the Lagrange multiplier for constant electron number N

2.1. The Lagrange multiplier technique and the force of constraint

We recall a few salient features of the Lagrange multiplier method [Citation17, Citation18] for the imposition of constraints when finding an extremum of a function. Figure  shows an example in the case of an objective function f (the function for which an extremum has to be found) in two dimensions, under the constraint h(x1,x2)=c. At an arbitrary point, depicted to the left in the domain of feasible points (those obeying the constraint, i.e. on the curve h(x1,x2)=c), there is a component f// of f along the curve of feasible points, which implies that the optimal point has not been reached. The perpendicular component is compensated by the ‘force of constraint’ exerted by the constraint, λh=f. [It is common practice to use the language of mechanical problems and denote f as a potential in which the minimum is searched and where the probe particle moves in the feasible domain until no forces are any more exerted.] The optimal point (x1,x2) is characterised by the absence of a parallel force, f//(x1,x2)=0, and the Lagrange multiplier λ at that point is a measure for the strength of the necessary force of constraint.

Figure 1. A two-dimensional example of application of the Lagrange multiplier: finding the minimum value of the function f(x1,x2) under the constraint h(x1,x2)=c. The crossing point of h=c and f=c5 represents an arbitrary point of the feasible set of points (those obeying the constraint) at which the component f// in the domain of feasible points is still nonzero. (x1,x2) is the optimal point, where f//(x1,x2)=0 and f(x1,x2) is compensated by the force of constraint λh(x1,x2).

Figure 1. A two-dimensional example of application of the Lagrange multiplier: finding the minimum value of the function f(x1,x2) under the constraint h(x1,x2)=c. The crossing point of h=c and f=c5 represents an arbitrary point of the feasible set of points (those obeying the constraint) at which the component ∇f// in the domain of feasible points is still nonzero. (x1∗,x2∗) is the optimal point, where ∇f//(x1∗,x2∗)=0 and ∇f⊥(x1∗,x2∗) is compensated by the force of constraint −λ∗∇h(x1∗,x2∗).

In textbook examples, the objective function is typically defined also outside the domain of feasible points, so that f is defined. This is also necessary for straightforward application of the Lagrange multiplier technique, which requires all derivatives of the Lagrangean, without constraints, to be zero: (3) L(x1,x2)=f(x1,x2)λ(h(x1,x2)c)Lxi=fxiλhxi=0 i(3) It may happen of course that the objective function is not defined outside the feasible domain, for instance for physical reasons (as in our case of the Hohenberg–Kohn functional). That is not a problem if f//(x1,x2) can be determined: the point (x1,x2) where f//(x1,x2)=0 will still be the same. Application of the Lagrange multiplier technique, however, requires that f(x1,x2) is defined outside the feasible domain, so that the derivative f(x1,x2) can be determined. As a way out, one can simply extend in an arbitrary but suitable way the definition of f into the nonfeasible domain, so the derivatives of f exist and f is defined. The extension is arbitrary in the sense that the actual value of f is unimportant, it only affects the magnitude of the force of constraint which in this case is only an auxiliary quantity (chosen by the user), not a physical quantity. The extension must be suitable in the sense that the derivative must exist. A nonexistent derivative, for instance different left and right derivatives at the point (x1,x2), precludes the application of the Lagrange multiplier method; the whole purpose of this method is to enable the calculation of the full derivatives, without any possibly very difficult restrictions on those derivatives due to constraints.

2.2. Differentiation of E[ρ] and the arbitrariness of E/N and of the Lagrange multiplier for the constraint of fixed electron number

In the Hohenberg–Kohn theory, we have the situation that the objective functional Ev[ρ] is not defined outside the feasible domain of densities that integrate to N electrons, ρdr=N. Part of the derivative δE/δρ is therefore undefined (we omit the subscript v when there is no risk of confusion). Following Parr and Bartolotti [Citation19] (see also Gál [Citation20]), we make this explicit by writing the density as the product of N and a ‘shape function’ σ(r) which is normalised to 1: (4) ρ(r)=Ng(r)g(r)drNσ(r),σ(r)dr=1(4) where g(r) can be any decent nonnegative function. The density variation δρ(r) can be split in N-conserving and shape-conserving components, (5) δρ(r)=Nδσ(r)+σ(r)δNδNρ(r)+δσρ(r),(5) where the subscripts N and σ denote variations δρ(r) where N or σ are kept constant, respectively. δσ should obviously integrate to zero (and should be orthogonal to σ, σ(r)δσ(r)dr=0, see Appendix 1). The energy variation becomes (6) δE[ρ]=δE[ρ]δρ(r)δρ(r)dr=δE[ρ]δNρ(r)+δE[ρ]δσρ(r)δNρ(r)+δσρ(r)dr=δE[ρ]δNρ(r)δNρ(r)dr+δE[ρ]δσρ(r)δσρ(r)dr(6) where we have introduced the N-conserving and shape-conserving constrained derivatives, indicated with subscript N and σ in the functional derivative, following Gál [Citation20]. The constrained derivatives for functionals are like partial derivatives for functions, with of course special adaptations. In going from the second to the third line in (Equation6), we have used the fact that the cross terms (N-conserving derivative working on shape-conserving density change, and vice versa) are zero. See Appendix 1 for further elaboration and definitions. Note the analogy with the partial derivatives of functions of more variables, δf=(f/x1)δx1+(f/x2)δx2, where there are also no cross terms. Now (7) δE[ρ]δρ(r)σ(r)δNdr=δE[ρ]δρ(r)ρ(r)NdrδN=E[ρ]NδN(7) (cf. Equation (EquationA8)) which defines the partial derivative of E[ρ] with respect to N, E/N. This is r-independent, a constant. However, it is undefined in HK theory, since E[ρ] is not defined for noninteger N's in the neighborhood of an integer N.

The problem of the value of Ev[ρ] outside the domain of normalised densities has been addressed in various ways, usually in discussions of the Hohenberg–Kohn functional F, defined according to (8) Ev[ρ]=F[ρ]+ρvdr;F[ρ]=Ψ[ρ]|Tˆ+Wˆ|Ψ[ρ](8) The expectation values of the kinetic energy (Tˆ) and the electron–electron Coulomb interaction energy (Wˆ) have been defined for fixed electron number (N-electron) systems according to the Hohenberg–Kohn, Levy–Lieb or Lieb prescriptions [Citation9, Citation21]. The Lieb functional FL[ρ], which leads to a convex functional, is always used in mathematical treatments (see earlier work by Levy [Citation22] and Valone [Citation23] on searches over density matrices of N-electron states) (9) Lieb:FL[ρ]=infDˆρTrDˆTˆ+Wˆ(9) (10) Dˆ=i=1nλi|ΨiNΨiN|;0λi1,i=1nλi=1(10) Lieb [Citation8] in his study of functional analysis aspects of DFT proposed to define FL[ρ] as FL[ρ]=+ for densities that fall outside the domain of positive functions that integrate to N. Infinite values are well defined in the theory of convex functionals, and they are usually introduced to deal in a simple way with domain issues [Citation9]. This choice has usually been followed in analyses of the differentiability of FL [Citation9–11, Citation24, Citation25]. It is however very different from the choice of Ref. [Citation6] to define FL[ρ] for such densities by an extension of the density matrices over which the search is performed to ensembles of density matrices with different integer N values, see Section 4.1. That search then leads to linear interpolation of the energy between the N- and (N+1)-electron ground state energies at the electron-rich side (with derivative A) and between the N- and (N1)-electron ground state energies at the electron deficient side (with derivative I). But an infinite derivative or the lack of differentiability at the integer point (different left and right derivatives) invalidates the Euler–Lagrange equation (Equation14) (the force of constraint cannot be determined). Lieb has also indicated that more regular choices can be made, such as proportionality of FL to ρdr=N (N noninteger), e.g. FL[ρ]=(ρdr)FL[σ(r)] with straightforward derivative (see remarks at Theorem 3.3 [Citation8]). We stress here the arbitrariness of the definition of FL in the noninteger N domain, hence the freedom to make a suitable choice.

In applications of the Lagrange multiplier method, it is required that E[ρ]/N is a defined quantity. Then we should make sure our results are independent of the choice we have made. In principle (when trying to determine the N-electron ground state density ρ0N), we do not need E[ρ]/N because we could only search in the domain of N-electron densities, so at ρ0N we must have, using the density shape factor g of Equation (Equation4) (cf. [Citation19, Citation26]) (11) δE[ρ0N]δg(r)=δE[ρ0N]δρ(r)δρ(r)δg(r)dr=0(11) where we have used the chain rule. Since [Citation26] (12) δρ(r)δg(r)=Ng(r)drδ(rr)g(r)g(r)dr(12) we have from (Equation11) and (Equation12) at ρ0N (13) δE[ρ0N]δρ(r)=δE[ρ0N]δρ(r)g(r)g(r)drdr(13) This shows that at ρ0N the functional derivative δE/δρ(r) is not a function of r but is a constant. This fits in with the fact that at ρ0N the functional derivative for N-conserving variations of ρ, δE[ρ0N]/δNρ(r), is zero, see next paragraph (and in agreement with Equation (Equation11)). So the full derivative δE[ρ0N]/δρ(r) reduces to the constant (independent of r) E[ρ0N]/N, as is evident by comparing (Equation13) with (Equation7). We cannot derive this constant from Equation (Equation13), since any constant value is compatible with that equation. That is in order, since we have required the functional derivative to be zero only with respect to shape variation, Equation (Equation11). We can expect to obtain a result that is compatible with any value of E[ρ0N]/N. The fact that E[ρ0N]/N is not defined in DFT, since HK theory does not provide values of the functional E[ρ] for N-nonconserving ρ, and therefore not of the derivative, is not a problem. It should suffice to only require the derivative to be zero for N-conserving variations of ρ, δE[ρ0N]/δNρ(r)=0, or δE[ρ0N]/δg(r)=0. There is a risk that it is not recognised that the undefined constant is part of the total derivative, but not of the constrained derivative, see comments on Equation (Equation17) and see Appendix 2.

In practice, constrained derivatives are not used since they are hard or impossible to obtain, and the standard Lagrange multiplier method is used because it has the great advantage that it allows to work with unconstrained derivatives, which are usually straightforwardly obtained. We try to find the minimising density from (14) δδρ(r)E[ρ]μρ(r)drNδρ(r)dr=0(14) (N is integer). This should hold for arbitrary δρ. We may first choose for δρ(r) just δσρ(r) and must have at the optimum density ρ0N (15) δE[ρ0N]δNρ(r)+E[ρ0N]Nμδσρ(r)dr=0.(15) Since [δE[ρ]/δNρ(r)]δσρ(r)=0 (see Appendix 1) this yields μ=E[ρ0N]/N: the force of constraint is determined by the choice of E/N. Using next the remaining δρ(r) space of norm-conserving {δNρ(r)}, and substituting μ=E[ρ0N]/N, it is clear that at ρ0N we must have (16) δE[ρ0N]δNρ(r)δNρ(r)dr=0δE[ρ0N]δNρ(r)δρ(r)dr=0δE[ρ0N]δNρ(r)=0(16) where the second line follows since we already had δE[ρ0N]/δNρ(r)δσρ(r)dr=0, as used above. Here it does not suffice to use only the subspace {δNρ(r)} and require that the N-conserving derivative is only zero for application on that subspace. That requirement alone does not completely fix δE[ρ0N]/δNρ(r).

There is a pitfall lurking here. It has often been noted in the literature [Citation5, Citation9, Citation19, Citation26–28] that, since only norm conserving density variations are allowed in HK theory, derivatives like δE/δρ(r) (or δF[ρ]/δρ(r)) are only defined up to a constant: if δNρ(r)dr=0, then (17) δE=drδEδρ(r)δNρ(r)is equal to δE=drδEδρ(r)+CδNρ(r)(17) It then appears that the same argument can be applied to the constrained derivative δE/δNρ(r), which is then also believed to be only defined up to a constant. However, full definition of the functional derivatives requires their ‘operation’ to be defined on arbitrary δρ, not only normconserving δNρ. As we have seen, this prohibits a free constant in δE[ρ0N]/δNρ(r). In the Hohenberg–Kohn theory, there is such an undefined constant in the full derivative δE[ρ0N]/δρ(r), which stems from the HK restriction to integer electron systems, causing E/N to be an undefined quantity in the theory. A constant is not allowed in the constrained derivative δE[ρ0N]/δNρ(r) since it would give δE[ρ0N]/δNρ(r) a component in the subspace {δσρ(r)}, to which it should be ‘orthogonal’ (note the analogy of δE[ρ0N]/δNρ(r) with f// of Figure ). See also discussion in Appendix 2.

We have seen that it is advisable to define E[Nσ] as a continuous function of N in a neighbourhood of the integer value N, so that the derivative E[Nσ]/N exists. We mentioned such a choice, derived from Lieb, earlier. Parr and Yang [Citation26, p. 84] suggest to take a parabolic fit over the three points E(N1),E(N) and E(N+1). This yields a continuous E(N) in the neighbourhood of N, with E[ρ0N]/N=(I+A)/2, i.e. Mulliken's electronegativity. Görling [Citation5] has analysed the arbitrary constant in the total derivative of the energy (or equivalently the Hohenberg–Kohn functional F[ρ]) along similar lines, and fixes it by adhering to the energy behaviour at noninteger N according to the PPLB straight lines picture of Ref. [Citation6]. This has the difficulty of different left and right derivatives at the integer point. It would also be possible to choose E[ρ] independent of N in the neighbourhood of N, which would yield E[ρ0N]/N=0. All of this is compatible with the fact that E/N is arbitrary.

3. Janak's theorem: not a DFT theorem

Relation (Equation2) is widely quoted as Janak's theorem and is often considered a fundamental relation in Kohn–Sham DFT. However, there are difficulties with this relation. One can try to formulate relation (Equation2) for a mathematically defined energy where occupation numbers (that supposedly can vary continuously) are introduced. Janak [Citation4] introduced occupation numbers in a total energy expression (called E~), which we call EKS1 because it has occupation numbers to the first power (see Section 5 for an alternative) (18) EKS1=σ=α,βpnpσφpσ|122+vext|φpσ+12p,qσ,τ=α,βnpσnqτ×φpσ(x1)φqτ(x2)φpσ(x1)φqτ(x2)|r1r2|dx1dx2+Exc[ρα,ρβ],ρσ(r)=pnpσ|φpσ(r)|2(18) The linear dependency on occupation numbers in the expression for the density and for the one-electron terms, which had earlier been introduced in the Xα and Hartree–Fock total energy expressions, is arbitrary (cf. Section 5) but plausible. For integer values (1 and 0) of the occupation numbers, there is no effect. In order for EKS1 to have a defined derivative with respect to an occupation number, it is necessary that Exc[ρα,ρβ] is defined for densities with fractional electron numbers (a neighbourhood of an integer np). But there is a problem with the value of Exc[ρ] in Equation (Equation18) at noninteger electron numbers. Equation (Equation18) is in fact the defining equation for Exc[ρ], its value is determined by all other terms (in particular EKS1[ρ]) which need to be defined. But at noninteger electron numbers, the energy EKS1[ρ] (which should be Ev[ρ]) is not defined, and therefore Exc[ρ] is not defined. There are simply no physical systems with a noninteger number of electrons, for which an energy Ev[ρ] would be obtainable. Also the Kohn–Sham construction of a noninteracting system of electrons is not possible for fractional electron numbers, so Exc[ρ] at noninteger electron densities is not a physical quantity. It is of course possible to choose an energy for noninteger electron densities in some way or another, for instance using ensembles of integer electron systems (see Section 4.1), but then it has to be kept in mind that no physical information can follow from such a man-made choice. Janak [Citation4] did not specify EKS1[ρ] for noninteger densities.

We may wonder if Janak's theorem (Equation2) may at least have meaning at the integer point, with occupation numbers introduced according to EKS1 (Equation18). Let us consider explicitly the derivative (19) EKS1npσρ0N=φpσ|122+vext+vCoul|φpσ+δExcδρ(r)ρ0Nρnpσρ0Ndr=φpσ|122+vext+vCoul+(vxc+C)|φpσ=ϵpσ+C(19) (we consider for simplicity the spin-compensated case where vxc is the same for spin-up and spin-down orbitals). We have indicated the occurrence of an unknown constant C=Exc/N. It is unknown because the derivative for a change δN in the number of electrons is unknown, see Section 2. In the total derivative of Exc, (20) δExc[ρ0N]δρ(r)=δExc[ρ0N]δNρ(r)+Exc[ρ0N]N,(20) the N-conserving derivative δExc[ρ0N]/δNρ(r) taken at the ρ0N point may be identified with the exchange–correlation potential vxc (see below), but still there is the undefined constant Exc/N. So the derivative E/np is not a defined quantity in Kohn–Sham DFT, not even at the ground-state density. It certainly is not defined at fractional electron densities, where not even the N~-conserving derivative δE[ρ]/δN~ρ(r),ρ(r)dr=N~=N+ω is defined. At the N-electron ground-state density ρ0N, we are free to choose the constant Exc[ρ0N]/N, but then we cannot draw conclusions about the physics of a system from that choice.

The undetermined constant in the functional derivative, i.e. in the xc potential, also arises in the derivation of the Kohn–Sham equations, but it is harmless there. In the traditional derivation of the KS equations [Citation29], one minimises the KS expression for the energy with respect to variations in the density via the orbitals, from which the one-electron Kohn–Sham equations follow. The application of the chain rule in the variation of the Exc term in the energy leads to the exchange–correlation potential vxc(r)=δExc/δρ(r), without any precaution being taken that the total derivative is not defined (i.e. not for N-nonconserving density variation). However, the chain rule for the Exc term gives (21) δExc=pNδExcδφp(r)δφp(r)dr=pNδExcδρ(r)δρ(r)δφp(r)drδφp(r)dr=pNδExcδNρ(r)+ExcNφp(r)δφp(r)dr=pN(vxc(r)+C)φp(r)δφp(r)dr(21) where vxc(r) is the defined derivative δExc/δNρ(r) (because Exc is defined for N-conserving densities) and one is left with the undefined constant C=Exc/N. But in this case, this arbitrary constant in the KS potential does not have any physical consequences. A constant in the potential (which extends over all space, including asymptotic regions) just shifts the whole eigenvalue spectrum up or down. This is the well-known gauge freedom of a local potential, which can be eliminated always by choosing the potential to go to zero at infinity, so that all calculations work with the same gauge and the orbital energies become comparable. This is commonly done. Although the commonly used definition vxc(r)=δExc/δρ(r) is strictly speaking not correct, or not complete, since it ignores the fact that the HK theorem only allows N-conserving densities, we now see that this omission does not have any consequences. There is not a similar saving grace in the derivation of Janak's theorem, since there the undefined constant exactly expresses that it is undefined what the theorem claims to tell, namely how the energy changes under a change δN=δnp.

One may wonder if fractional occupations could be meaningfully introduced by abandoning the Kohn–Sham model system of N noninteracting electrons in a local potential. Parr and Yang [Citation26, § 7.6] have raised the question if then the Janak theorem could be put on a secure footing. They consider a generalisation of the noninteracting kinetic energy Ts[ρ] to a Janak kinetic energy (22) TJ[ρ]=minni,ψiρiniψi|122|ψi(22) where the search is over all possible ni (0ni1) and orthonormal orbitals ψi yielding the given density (constraining it to N electrons) (23) ρ(r)=ini|ψi(r)|2ini=N(23) The exact total energy functional can now be written (24) E[ρ]=TJ[ρ]+v(r)ρ(r)dr+WHartree+ExcJ[ρ](24) where WHartree=(1/2)ρ(r1)ρ(r2/r12)dr1dr2 and ExcJ[ρ] is defined by this equation and is different from the standard Exc[ρ] if TJ[ρ] would be different from Ts[ρ]. The question is if this set-up might lead to noninteger ni. Then occupation numbers would have been introduced, possibly fractional, without the physical model of Kohn and Sham of noninteracting electrons in a local potential, which is defined with integer (1 and 0) occupation numbers. With integer ni, the model of Equations (Equation22) and (Equation24) would reduce to the standard noninteracting kinetic energy of the KS determinantal wavefunction with the given ρ and lowest kinetic energy.

For E[ρ] of (Equation24) to be defined, ρ must be a density belonging to a ground state wavefunction (HK) or at least an N-electron wavefunction or density matrix (Levy–Lieb). Then ExcJ[ρ] is only defined for such integer-N densities. Parr and Yang [Citation26] derive, at a given set of {ni}, the KS-like equations for the ψi which have ni0 (25) 122+veffJ(r)ψi=ϵiψi(25) where veffJ=v(r)+vHartree(r)+vxcJ(r) with vxcJ(r)=δExcJ/δρ(r). Differentiating the total energy with respect to an occupation number ni is again problematic since the energy E[ρ+δρ] is not defined for a small N-nonconserving density change δρ=δni|ψi|2, and neither is ExcJ[ρ+δρ]. Ignoring this problem and again applying the chain rule for the derivative of ExcJ yields (26) Eni=ϵi(26) The behaviour of the energy (Equation24) under occupation number changes according to Equation (Equation26) is well known to lead to Aufbau, i.e. ni=1 for the lowest orbitals emerging from the optimisation, and na=0 for the remaining ‘virtual’ orbitals. This is easily seen by considering infinitesimal occupation number changes. The Janak–Kohn–Sham system with TJ[ρ] instead of Ts[ρ] then simply reverts to the Kohn–Sham system.

This raises the question if the Janak kinetic energy is actually different from Ts[ρ]. We can prove that this is not the case, i.e. the orbitals resulting from the minimisation of Equation (Equation22) are the KS orbitals belonging to ρ and the optimal occupation numbers obey Aufbau, so (27) TJ[ρ]=Ts[ρ].(27) Let us minimise TJ[ρ] by varying {ni} and {ψi} under the constraint that ini|ψi(r)|2=ρ(r) at each r, for which purpose we introduce the r-dependent Lagrange multiplier μ(r). Normalisation of the ψi is maintained with the usual Lagrange multipliers ϵi. The Lagrangean LJ becomes (28) LJ=iniψi|122|ψi+μ(r)ini|ψi(r)|2ρ(r)driϵi|ψi(r)|2dr1(28) For a fixed set of {ni}, optimisation of the orbitals [Citation26] yields the equations (29) 122+μ(r)ψi=ϵiψi(29) for the occupied orbitals, where ϵi=ϵi/ni. The Lagrange multiplier μ(r) acts as local potential in this one-electron equation. Orthogonality of the orbitals then follows from the hermiticity of the operator and need not be enforced separately [Citation26]. Before determining μ(r) in the usual way from the constraints, we note that with a fixed potential μ(r) the energy of the noninteracting electron system with Janak kinetic energy would be (30) EsJ=iniψi|122|ψi+μ(r)ρ(r)dr(30) Minimisation of this energy by variation of the {ni} and {ψi} under the constraint ini|ψi(r)|2=ρ(r) leaves the μ(r)ρ(r)dr term invariant and is therefore equivalent to (should lead to the same {ni} and {ψi} as) the minimisation in the definition of Janak kinetic energy according to (Equation22). Considering then variation of the {ni}, we note that now there is no problem with taking the derivative with respect to ni in (Equation30) since an exact energy for a noninteger electron system is not required for its definition. The derivative will just be (31) EsJni=ψi(r)122+μ(r)ψi(r)dr=ϵi(31) So the minimum will be obtained for Aufbau. If the fixed set of {ni} with which we started was inadvertently not the Aufbau choice of {ni}, which we may call {niA}, the condition iniA|ψi(r)|2=ρ(r) will not be obeyed with the present set of {ψi} and {ni}. So then we have to repeat the process with occupation numbers according to Aufbau. The potential μA(r) will become exactly the KS potential vs[ρ](r) belonging to density ρ(r), since that KS potential is unique. So TJ[ρ] is not a new ‘Janak’ kinetic energy but it is just Ts[ρ]. There is no Janak kinetic energy and no Janak–Kohn–Sham model.

Recently Li et al. [Citation16] studied the Janak construction with a different purpose, namely to see if in case of a noninteger ρ this would not lead to occupation of higher virtual KS orbitals but just to the fractional occupation of frontier orbitals (Aufbau). They concluded to this Aufbau behaviour of the occupation numbers. However, we have indicated that E[ρ] is problematic in case ρ is a noninteger density. Even if a TJ[ρ] exists for noninteger densities, it cannot be used in Equation (Equation24), since with a noninteger density, E[ρ] in that equation is not defined.

Valiev and Fernando [Citation30] have objected against the Janak theorem on somewhat different grounds. They point out that the results of the work by Englisch and Englisch [Citation31] imply that the energy of a noninteracting system at fractional values of the occupation numbers (except for the degenerate levels at the Fermi energy) would not be differentiable with respect to the density.

We are not considering here fractional occupations of degenerate levels at the Fermi energy, which is theoretically [Citation8, Citation32] and practically [Citation33, Citation34] a well understood case, and neither do we consider density matrix functional theory (DMFT). Using the spectral resolution of the 1-matrix γ(x1,x1)=pnpψp(x1)ψp(x1), the optimisation in DMFT is with respect to both the natural orbitals and the occupation numbers. In DMFT, the latter are an intrinsic part of the theory, they typically become all fractional. The inequality constraints 0np1 have to be applied with the Karush–Kuhn–Tucker method, see Giesbertz and Baerends [Citation35]. Orbital energies start to play a very different role, cf. Gilbert's [Citation36] famous finding of degeneracy for all fractionally occupied natural orbitals, which usually means all natural orbitals.

4. Energy behaviour at noninteger electron numbers

Ever since the seminal PPLB paper [Citation6], the straight-line behaviour of the energy over the (N,N+1) and (N,N1) intervals has received much attention in the literature. It should be emphasised that PPLB addresses the issue of fluctuating particle number as can occur in macroscopic samples at finite temperature, and use the grand canonical ensemble of statistical mechanics to treat this variable particle number. They derive straight-line behaviour in the T0 limit (see the full details treated in [Citation7]). On the basis of this work, it is sometimes assumed that even for a single quantum system (atom or molecule) E/N is NOT arbitrary, but has to be I on the (N,N1) interval and A on the (N,N+1) interval, with discontinuous derivative at N. We do not feel that it is a correct interpretation of the results of Ref. [Citation6] to consider this behaviour as mandatory or ‘exact’ for a single molecule. For the common case of DFT calculation on a single molecule at T=0, the interpolation of the density between the integer N and N±1 ground-state densities and the corresponding straight-line behaviour of the energy constitute just one choice for the essentially arbitrary continuation (see Section 2) to nonphysical fractional electron number. We will discuss this in Section 4.1.

One can also argue in favour of linear energy behaviour on the basis of size-consistency requirements for dissociation of molecules, as has been done by Yang, Zhang and Ayers [Citation37, Citation38]. This is the subject of Section 4.2, where we argue that it is not size-consistency but the (unwarranted) requirement of locality of the functional that leads to the linear behaviour. In Section 4.3, we recall that application of the straight-line energy (assuming it holds for single quantum systems) has the well-known success of describing dissociation of a molecule into integer electron fragments, but we caution that again the unwarranted local approximation is invoked. In Section 4.4, it is shown that the local approximation, which is inherent in the application of the straight-line energy to dissociation, also causes it to fail: the derivative discontinuity jump in the KS potential is not quantitatively correct, in that it does not lead to proper dissociation of a heterogeneous electron pair bond.

4.1. Straight-line energy behaviour from grand canonical ensemble considerations and for single molecules

The statistical mechanical approach of [Citation6, Citation7] implies that the straight-line behaviour applies to the average energy of the molecules in a dilute macroscopic gas of molecules, that can exchange electrons with a reservoir, at the properly taken T0 limit. Also a very simple consideration at just T=0 makes that clear. We can then take μ as a tunable parameter of the reservoir, regulating the energy involved in electron transfer to and from the reservoir. A physical realisation is discussed by Perdew in [Citation7] taking the reservoir to be a metal with workfunction Φ having negligible coupling integrals with a far away molecule. Then at T=0 when μ=Φ drops below minus the molecular ionisation energy, evidently the ground-state energy corresponds to all molecules giving up an electron to the reservoir, i.e. become ionised. Similarly, as soon as μ would rise above minus the electron affinity, all molecules would turn into negative ions. At the points μ=I and μ=A, the number of electrons on the molecules is undetermined. This is also what Perdew observes (see Equation (27) of [Citation7]) for a molecule and a metal with work function Φ as reservoir. At those specific values of μ, many situations are possible, each characterised by specific probabilities to find N and N−1 electrons on a molecule (for μ=I) or N and N+1 electrons (for μ=A). Perdew [Citation7] discusses the wavefunctions one may construct for the reservoir-molecule system that yield specific probabilities for the possible integer number of electrons on a molecule. When there is negligible coupling between molecule and reservoir, the wavefunction construction can yield the same probabilities as a mixture of density matrices [Citation6, Citation7]. This yields a clear interpretation of the noninteger electron number and the corresponding energy as the averages of these quantities over all molecules of the dilute gas. For instance, if at μ=A m electrons (say) are taken up from the reservoir by the gas with a very large number M of molecules, they have to go (at zero temperature) to m molecules, the change of the average number of electrons per molecule is ΔN¯=m/M and the average energy change is ΔE¯=mA/M. One obtains the derivative of the average energy E¯/N¯=ΔE¯/ΔN¯=(mA/M)/(m/M)=A at N¯=N+ω electrons. Going to μ=I and N¯=Nω electrons, the derivative makes a quantum jump at integer N to I because of the quantum nature of the molecules. At integer N, there is not a defined derivative E¯/N¯ (because these considerations are at T=0 and do not take into account the proper T0 limit). It is to be emphasised that this behaviour of the average energy and its derivative with respect to the average number of electrons N¯ do not imply the same properties for any individual atom or molecule. No actual system can have a fractional number of electrons. Such systems are fictitious and we cannot calculate a wavefunction for them or get any information such as the energy for them.

The arbitrariness of the energy of a single atom or molecule at noninteger density implies that one may construct a functional which has some convenient behaviour. It is important then, of course, to refrain from conclusions about the physical (integer electron) system from the arbitrary choice. As example, let us construct a straight-line behaviour of the energy for a single molecule at noninteger electron number along similar lines as in Ref. [Citation6], but now we relinquish any statistical mechanical underpinning and are dealing frankly with the nonphysical system of a single molecule with fractional electron number. We first recall that following work by Levy [Citation22] and Valone [Citation39], Lieb [Citation8] formulated the Lieb functional in terms of a constrained search over an ensemble of N-electron pure state density matrices, Equation (Equation9). The Lieb functional has the important property of being convex. The further step can be taken of considering a density that integrates to a noninteger number of electrons, where the choice is made that the density be produced by taking an ensemble of density matrices of states Ψi of different electron numbers Ni, (32) Ev[ρ]=ρ(r)v(r)dr+F[ρ]F[ρ]minDˆρTr[Dˆ(Tˆ+Wˆ)]Dˆ=iλi|ΨiΨi|,iλi=1,Tr(Dˆρˆ(r))=iλiρi(r)=ρ(r),ρ(r)dr=N~=iλiNi,(32) where we indicate general possibly noninteger N with N~ (see Eschrig [Citation24] for a comprehensive treatment). With the generally assumed convexity of the energy as a function of number of electrons (E(N)(E(N+1)+E(N1))/2), the minimum energy at N~ in between, say, the integers N and N+1, will be straight line interpolation between E0N and E0N+1. So at N~=N+ω (33) Dˆmin=(1ω)|Ψ0NΨ0N|+ω|Ψ0N+1Ψ0N+1|E~(N~)=(1ω)E0N+ωE0N+1ρ(N~)=(1ω)ρ0N+ωρ0N+1(33) However, E~(N~) cannot be taken to be the exact energy of a noninteger electron system at T=0. Such a system does not exist, so it is not possible to determine its exact energy. The density ρ(N~) and energy E~(N~) represent just one out of the many possible ways to continue the density and energy of an N-electron system into the nonphysical fractional electron domain. This choice leads to a N~-derivative with the constant value A on the (N,N+1) interval, and I on the (N1,N) interval. Given the arbitrariness of E/N according to Section 2, this choice is possible, but it would not work in a density optimisation with a Lagrange multiplier for the ρdr=N constraint, since the derivative is not continuous at N. Equations (Equation32) and (Equation33) for a single N~-electron system lack the (statistical) physical interpretation of the straight-line behaviour of Refs. [Citation6, Citation7].

The representation of a fractional system by an ensemble of density matrices corresponding to different electron numbers leads to a jump in the KS potential when passing the integer N, as first noted in [Citation6, Citation7]. It is known that when we represent the ground-state density of a system, ρ0N say, with a KS determinant, the exact asymptotic decay of the density according to e22Ir should be matched by the decay of the KS density, which is that of the slowest decaying orbital density, the HOMO density, e22ϵHr. From this follows ϵH=I [Citation40–42]. If we consider the density ρ(N~), it is clear that as soon as ω>0 there is a contribution of the ground state (N+1)-electron density to ρ(N~). One can represent the density ρ(N~) with an ensemble of two KS densities, with the same KS potential and the same orbitals, but with different occupations of the orbitals. One KS density will have weight (1ω) and N occupied orbitals (ignoring spin), and the other one will have weight ω and (N+1) occupied orbitals. So the (N+1)-th orbital (the former LUMO) becomes occupied with ω electrons, and the slowest decay of the ensemble density will be according to e22ϵ~Lr. Here the orbital energy of the LUMO must be ϵ~L=A, at any ω>0, since the ionisation energy of the negative ion, which dictates the decay of ρ0N+1, is A. Now in general the LUMO level of the N-electron system, ϵL, will be lower than A, therefore the KS potential must shift up by a constant Δ over the atom or molecule (but not asymptotically) so that the LUMO level is raised to A:Δ=AϵL. If the N-electron system was open shell, then HOMO and LUMO would be at the same energy I (ignoring spin polarisation effects), and the upshift Δ of the potential in this case is IA, equal to the postulated discontinuity IA in the derivative of the energy at N~=N, (34) E~N~N~NE~N~N~N=IA=Δ(34) The upshift is therefore also often denoted as ‘the derivative discontinuity of the potential’, even in the case it is Δ=AϵL. If we start with a closed shell system, the LUMO level will be above ϵH=I and the upshift Δ=ϵ~LϵL<IA. It is still often denoted as derivative discontinuity. We return to the jump behaviour in Section 4.4

4.2. Locality and size consistency

We have encountered the straight lines (SL) energy as either the average energy according to statistical mechanics of the systems in a macroscopic sample at a properly taken T0 limit according to PPLB, or as an arbitrary behaviour for a single noninteger electron system at T=0 following from a particular construction for the energy in this nonphysical domain, see Section 4.1. We call the energy of (Equation33) the straight-line energy ESL. An interesting attempt to prove the linearity of such an energy (without recourse to statistical physics and a grand canonical ensemble) has been made by Yang, Zhang and Ayers (YZA) [Citation37, Citation38]. These authors considered the dissociation behaviour of molecules, extending similar arguments by PPLB [Citation6] (see next subsection), and find that linearity follows when the functional is required to be local. Since the property we call ‘locality’ is called ‘size-consistency’ by YZA, it is important to define terms here.

Size-consistency is a requirement on any correct theoretical treatment of a set of noninteracting (sub)systems (35) Size\ consistency: E(AB)=EA+EB(35) (the dots indicate long enough distance to make the interaction negligible). Size-consistency is a physical requirement formulated for a separation of a system into physical subsystems A and B with defined energies EA and EB. It does not apply for unphysical fragments lacking a defined energy, such as fragments with a noninteger number of electrons (which are necessarily entangled with other fragments). Size consistency should be obeyed by any proper functional (any proper quantum chemical method) since the energy is an extensive property.

The property of locality of a functional is something very different. It is not about (physical) systems but about functionals. A DFA is (semi-)local if it computes the energy with an energy density ϵ(r1) that uses at r1 only ρ(r1) or (in the semilocal case) only nearby densities, for instance when derivatives ρ(r1), 2ρ(r1), are used. LDA, GGA and meta-GGA are examples. We define also a property we call domain-locality. Domain-locality arises when a functional does not determine the energy density at point r from the electron density at (and in a neighbourhood of) r, but determines the energy contributions from separate (nonoverlapping) densities, in cases where the total density is built up from such disjunct pieces. A domain-local functional is for instance the ESL functional for a fractional electron density: with a density that is separated from the rest of the system, as for instance the atomic densities in dissociated H2+, it needs integration over that local domain to find the fractional number of electrons N~, which is an ingredient in the ESL energy determination. For a local functional (either strictly local, semi-local or domain-local), the following relation holds (the dots indicate nonoverlapping densities) (36) (strict,\ semi--,\ domain--)locality:EDFA[ρAρB]=EDFA[ρA]+EDFA[ρB](36) So locality implies that energy values are assigned even to possibly unphysical (noninteger) densities ρA and ρB. Nothing in the proof of the HK theorem makes us expect the HK functional to be local. The prime example of nonlocality of the exact functional occurs for a system of two open shell atoms X and Y at large separation in its singlet ground state. The Exc functional cannot be local since its derivative, the xc potential, must be upshifted by a constant over the atom with highest ionisation energy, the constant being determined by the ionisation energy of the other atom, however remote [Citation7, Citation14, Citation43–51]. Another example is found when a separated fragment density ρA does not correspond to a physical system, for instance when it does not integrate to an integer number of electrons. In that case the exact functional cannot be local. The simplest and very well-known case of unphysical fragments is H2+ at long distance, with electron charge densities of (1/2)e at each site, or an array of P protons with very large distances and 1 electron, with charge densities of (1/P)e at each site. For such a one-electron system, the Exc energy has to cancel the Hartree term, i.e. ρ(r)ϵxc(r)dr=(1/2)ρ(r1)ρ(r2)/r12dr1dr2, so ϵxc(r1)=(1/2)ρ(r2)/r12dr2, manifestly nonlocal.

Now one could try to approximate the correct result, for instance for a two-site system, with a (domain-)local functional, (37) F[ρ]=?Flocal[ρ]=Flocal[ρ(A)]+Flocal[ρ(B)](37) This local functional delivers the correct number (but only at infinite distance, outside the range of the Coulomb potential) if it is defined, for fractional electron number, to be a linear interpolation between the defined functionals for integer electron systems. This is how the ESL functional could be applied. First it is to be considered as a domain-local functional, which is to be applied for each of the two nonoverlapping densities ρA and ρB separately. These are, in the H2+ example, both 1/2-electron densities. For example, ρA is the average of a one-electron density ρH(A)=|1sH(rRA)|2 and a zero electron density ρ0(A)=0. For such a local noninteger electron density, ESL defines the energy as the average of the one-electron and zero electron energies, (38) FSL[ρAρB]=FSL[ρ(A)]+FSL[ρ(B)](locality)FSL[ρ(A)]=FSL12ρH(A)+12ρ0(A)12F[ρH(A)]+12F[ρ0(A)](38) where the first line expresses the requirement that the SL functional is to be applied as a domain-local functional. In the case of P proton sites with one delocalised electron (39) FSL[ρ(p)]=FSL1PρH(p)+P1Pρ0(p)1PF[ρH(p)]+P1PF[ρ0(p)](39) (with ρH(p)=|1sH(rRp)|2 and ρ0(p)=0). Equation(Equation39), where P can be arbitrarily large, shows that the interpolation has to be linear. This does not imply that a fractional electron system is physical. It only shows that we can, in the limit of noninteracting subsystems, try to ignore the nonlocal nature of the exact functional and introduce another functional, not exact, which is taken to be local (as defined in Equation (Equation36)) and which delivers the correct energy for the total (fully dissociated) system. We have to endow it with the following properties: (a) it accepts unphysical fractional electron densities (since such fragment densities may occur); (b) it interpolates linearly between the functionals of the nearby integer densities (Equation (Equation39) with variable P shows that linearity is required [Citation37]).

YZA [Citation37] have demonstrated more generally (for the case of degenerate subsystems) that the property of locality of a functional (in the case of disjoint subsystems) leads to the requirement that the functional has to be defined for noninteger densities (which the noninteracting subsystems may possess). It is then deduced, as above, that the behaviour of the energy for fractional electron densities should be linear. However, we wish to caution that locality is not a property of the exact functional. The requirement of locality (Equation36) is called size consistency in Ref. [Citation37], or functional size consistency. But the two should not be confused: size consistency is the property (Equation35) (this is called energy size consistency in Ref. [Citation37]). Size consistency is a property the exact functional will possess (as any bona fide theory must), and it should be required of approximate functionals (recall the requirement that A and B of Equation (Equation35) be physical systems). Locality (Equation36) is not a property of the exact functional, and can only be a property of approximate functionals. The requirement of size-consistency, in particular in the case of degenerate subsystems, and the difficulties involved in obeying this requirement for approximate (local) functionals, have been analysed by Savin [Citation52, Citation53] and Gori-Giorgi and Savin [Citation54]. Locality is not a requirement for size consistency. The exact functional will be nonlocal and must be size consistent.

4.3. Dissociation into integer electron fragments and (non)locality

A pleasing property of the SL energy ESL (i.e. E~(N~) of (Equation33)) is that it affords correct dissociation of molecules into integer electron fragments (atoms or molecular fragments), while many LDFA's yield fractionally charged fragments. This problem was originally raised by Slater with the dissociation of NaCl as example, cf. Ref. [Citation2, Ch. 4], and was shown to be solved by a SL energy in Ref. [Citation6]. However, it should be recognised that this correct dissociation behaviour only follows if the assumption is made that the functional is applicable in local fragments separately (domain-locality). We will review in this section how an ESL leads to dissociation of a diatomic into integer electron atoms. In the next subsection, we will then demonstrate that the straight line behaviour of the energy still cannot be ‘the exact functional’. The derivative discontinuity jump of the KS potential discussed at the end of Section 4.1 for the case of a single molecule, does not have the right magnitude in the case of dissociation of a heteronuclear diatomic. The right jump of the KS potential over an atom in a dissociated system has a nonlocal origin, which is why a locality Ansatz has to fail.

Let us denote the energy for fractional electron number E~(N~), where N~=N+ω,0<ω<1 or 1<ω<0 is the fractional electron number and E~ is linear on the two intervals. When a system XY separates into two parts X and Y with non-overlapping densities ρ(X) and ρ(Y), not necessarily integer, it is assumed the energy can be obtained as the sum of subsystem energies determined by the subsystem densities: E[ρ(XY)]=E~[ρ(X)]+E~[ρ(Y)] (assumption of locality of the functional). If the linear E~ is applied to each subsystem X and Y separately, dissociation into integer electron systems has the lowest energy. The argument runs as follows. Suppose the starting electron distribution is ‘wrong’ in the sense that A(Y)>I(X) (e.g. Y =Na+ and X=Cl). With the linearity for the energy at fractional electron number N~ and constant E~/N~=I or A for N~<N and N~>N, respectively, we will have that a fractional charge transfer δN~ will occur from X to Y with energy change (40) ΔE=E~[X]N~(δN~)+E~[Y]N~(δN~)=I(X)δN~A(Y)δN~<0(40) If e.g. Y =Na+, the 3s level of Na+ will, due to the derivative discontinuity jump of the KS potential, jump up to the 3s level of neutral Na, at I(Na), for an arbitrary amount ω (0<ω<1) of electron transfer, while the 3pσ of Cl1+ω remains at I(Cl), see Figure (a). Since the energy derivatives remain constant at A(Y) and I(X) all along the fractional charge on X on the (NX,NX1) interval and on Y on the (NY,NY+1) interval, transfer will continue until a complete electron has transferred from X to Y. So separation of X and Y will occur into two integer electron systems. However, this is not a stable end result for the SL energy model, see next section.

Figure 2. (a) Na+Cl: The constant frontier orbital energies of Na+ and Cl for a straight-line energy during the charge equalisation process from charged fragments to neutral atoms. After the discontinuity jump of the Na+, 3s level has occurred at transfer of any amount ω (0<ω1) electrons to the 3s level of neutral Na at I(Na), it is still below the 3pσ levels of Cl (α and β level indicated with up and down arrows). (b) NaCl: The dissociated system of two neutral atoms Na and Cl and the derivative discontinuity jump upon a small electron transfer δ from the Na-3sα orbital to the lower lying Cl-3pσα orbital. The discontinuity jump puts Cl-3pσα at A(Cl)=I(Cl), above the Na-3s level. Degeneracy of Na-3sα and Cl-3pσβ does not result.

Figure 2. (a) Na+Cl−: The constant frontier orbital energies of Na+ and Cl− for a straight-line energy during the charge equalisation process from charged fragments to neutral atoms. After the discontinuity jump of the Na+, 3s level has occurred at transfer of any amount ω (0<ω≤1) electrons to the 3s level of neutral Na at −I(Na), it is still below the 3pσ levels of Cl− (α and β level indicated with up and down arrows). (b) NaCl: The dissociated system of two neutral atoms Na and Cl and the derivative discontinuity jump upon a small electron transfer δ from the Na-3sα orbital to the lower lying Cl-3pσα orbital. The discontinuity jump puts Cl-3pσα at −A(Cl)=−I(Cl−), above the Na-3s level. Degeneracy of Na-3sα and Cl-3pσβ does not result.

4.4. Nonlocality and dissociation of an electron pair bond

In spite of the correct dissociation into integer electron fragments, the linear energy behaviour and the inherent derivative discontinuity of the KS potential do not lead to completely correct dissociation. The problem is that the linear energy E~ has to be applied in combination with a (domain-)locality Ansatz. The systems X and Y are treated as independent subsystems to which locally the linear energy behaviour can be applied if they have noninteger electron number. The typical result of the dissociation discussed so far will be two neutral atoms (Y=Na[(3sα)1], X=Cl[(3pπ)4(3pσβ)1] in the given example). But this is not a stable situation. For the neutral atoms Y=Na and X=Cl, we now have that I(X)>I(Y), see Figure (b). For the atom X, the presence of another open shell atom Y far away in the universe, with a smaller ionisation potential I(Y) than I(X), makes it necessary that the KS potential exhibits an upshift all over the region of X (but not asymptotically) by the constant C=I(X)I(Y). This upshift has been identified a long time ago [Citation7, Citation43] and has received considerable interest [Citation14, Citation44–51]. Its origin is in the response part of the KS potential [Citation45]. That is understandable: the conditional amplitude that is underlying the response potential describes the strong correlation in this case: when one electron is on Y the other electron stays away on X and does not (not even partially) delocalise towards Y. The potential step following from the conditional amplitude arises because of the long-range correlation, a manifestly nonlocal effect. It is obvious that this step of the KS potential over atom X requires a nonlocal functional – the presence of another atom (Y) very far away and the magnitude of its ionisation energy I(Y) cannot be described with only the local density ρ(X) available.

The potential jump I(X)I(Y) over atom X makes the unpaired spin levels of both atoms degenerate. A closed-shell KS molecular orbital solution can result, with doubly occupied HOMO having 50–50 mixture of X and Y character. This KS solution will have equal probability of spin α or β on Y and X, in agreement with the exact wavefunction.

This correct solution does not result with the standard derivative discontinuity jump of the KS potential. Suppose that the dissociation to neutral atoms described in Section 4.3 has been completed, see Figure (b). As soon as a full electron has been transferred from Cl, the orbital energies of the generated neutral atoms revert to their ‘normal’ values (I(Cl) for the 3pσβ on Cl, I(Na) for 3sα on Na). So the empty Cl-3pσα will be below the occupied Na-3sα. Suppose that on a further iteration in an SCF calculation some number δ of α electrons is transferred from the higher lying Na-3sα to the lower lying unoccupied Cl-3pσα (it could be 1 electron if the transfer is not damped). The ‘derivative discontinuity’ jump of the KS potential will put Cl-3pσα at A(Cl) for δ electrons. But this will typically be a larger upshift than the required I(Cl)-I(Na), so this is not a stable situation: the electrons will on the following iteration be sent back to the now lower lying Na-3sα level. If actually carried out, this calculation will lead to infinite oscillation rather than a self-consistent stable solution because the desired situation with 50–50 mixing of X(=Cl) and Y (=Na) orbitals has to come from perfect degeneracy of X and Y orbitals, which is not achieved if the KS potential plateau of C=I(X)I(Y) over atom X is not generated. And it is not generated because the E~(N~) functional lacks the non-locality (i.e. knowledge of Y ) that is required to determine the right upshift of the KS potential over the X atom. The ‘derivative discontinuity’ jump of the KS potential over the X atom of magnitude I(X)A(X) is too large and prevents a stable proper dissociation situation, see Figure (b). The jump should not be I(X)A(X) (the locally determined derivative discontinuity jump) but I(X)I(Y) (dictated nonlocally) in order to obtain proper delocalisation of the α and β electrons over X and Y.

The problems discussed in this section and Section 4.2 arise from attempts to treat dissociated systems with neglect of the fact they are still entangled. Entanglement does not show up (or very weakly) in the local densities, but it does in the two-particle density matrix (pair density, conditional amplitude and conditional probabilities, for instance), and manifests itself in the nonlocality of the functional. Nonlocal functionals leading to proper dissociation can be formulated, but they typically require orbital dependency. Refs. [Citation55–58] give examples in a KS context, and in a DMFT context they are commonplace [Citation59–64]. A genuine density functional treatment of strongly correlated electrons, which is exact in the limiting case of dissociated H2, is provided by the strictly correlated electrons (SCE) functional [Citation65–68].

5. Meaning and applications of the relation E/ni=ϵi for Hartree–Fock, Xα and approximate density functional energies

While orbital occupation numbers do not have a place in exact Kohn–Sham DFT, in particular not when noninteger values are considered, this is different in many approximate total energy expressions, corresponding to approximate electronic structure models such as Hartree–Fock, or density functional approximations such as Xα, LDA, GGAs (LDFAs in general) and hybrid functionals. Even if fractional occupation numbers do not have a physical meaning, they still yield mathematically defined energy values, which allow mathematical manipulations to extract physically meaningful quantities. An example is provided by the use of occupation numbers in the Hartree–Fock energy, as originally done by Slater [Citation1, Citation2, Citation69],

(41) EHF1=σ=α,βpnpσφpσ|122+vext|φpσ+12σ,τ=α,βp,qnpσnqτ×φpσ(x1)φqτ(x2)(1P12)φpσ(x1)φqτ(x2)|r1r2|×dx1dx2(41)

We use the more general case of unrestricted Hartree–Fock. The summations over σ and τ run over the spin functions (α,β). The Hartree–Fock model is defined as the lowest energy determinantal wavefunction. Orbitals occur in the determinant or not. Fractional occupation numbers have no meaning in the theory, an orbital cannot be partly in the determinant. The introduction of occupation numbers can be seen as a convenient notational device, the occurrence of a spinorbital being determined by the occupation number being 0 or 1. The summations are over all orbitals (we use the conventional notation of indexing occupied orbitals with i,j,k,l,, unoccupied orbitals with a,b, and general orbitals with p,q,r,). The presence or the absence of an orbital is governed by np=1 or 0. In order to indicate that we extended the Hartree–Fock energy expression with occupation numbers as additional variables, for which we choose linear dependency (power 1), we denote this energy expression as ‘HF1’. Alternative expressions with occupation numbers are possible, see Equation (Equation46). The energy is usually optimised under orbital variation in EHF1 with fixed occupation numbers, where the orthonormality of the orbitals is treated with Lagrange multipliers. Treating the occupation numbers as variables, the optimisation has to be carried out under the constraints {niα=1,iHα;naα=0,a>Hα},{njβ=1,jHβ;nbβ=0,b>Hβ}, where Hα and Hβ are the highest occupied α and β spin orbitals, respectively. It is elementary to derive from Equation (Equation41) the Slater relation (42) EHF1npσ=ϵpσ(42) The constraints on the occupation numbers can be treated with Lagrange multipliers because Equation (Equation41) defines the objective function EHF1 also outside the feasible values of 0 and 1 for the occupation numbers (in contrast to the energy according to HK functional of the density). The Hartree–Fock model itself and the Hartree–Fock energy have no physical meaning outside the feasible values niσ=1,naσ=0. The choice for the value of the objective function EHF outside the feasible domain could be made differently than in EHF1, see below. Optimisation with Lagrange multipliers for the constraints (including the well-known constraints of orbital orthonormalisation) requires stationarity of the Lagrangean (43) LHF1=EHF1σp,qϵpqσφpσ|φqσδpqiHσλiσ(niσ1)a>Hσλaσ(naσ0)(43) The values of the Lagrange multipliers for the occupation number constraints follow from the conditions LHF1/npσ=0, (44) LHF1/npσ=0λpσ=EHF1npσ(44) The force of constraint, needed to keep the occupation number at its prescribed value, is proportional to the corresponding Lagrange multiplier λpσ (in this case equal to it, since (npσ1)/npσ=(npσ0)/npσ=1). From Equations (Equation44) and (Equation42) follows that (45) λpσ=ϵpσ(45) The fact that the Lagrange multiplier for the constraint of orbital normalisation, ϵpσ, is equal in magnitude to the force of constraint for constant occupation number, makes physically sense: regardless of the fact that increase of npσ to a value >1 does not have physical meaning in the Hartree–Fock model, such increase in (Equation41) would lower EHF1, in particular for large negative ϵpσ (core orbitals).

We may recall [Citation3] that a different choice could be made for the dependence of the energy on occupation numbers. Since the value of the objective function, EHF outside the domain to which the occupation numbers are constrained is arbitrary, one can choose the function behaviour there at will. Of course the force of constraint at the optimum point will depend on the chosen behaviour outside the domain, see Section 2. We can illustrate this by using for Hartree–Fock the energy (46) EHF2=i,σ(niσ)2φiσ|122+vext|φiσ+12σ,τ=α,βi,j(niσnjτ)2×φiσ(x1)φjτ(x2)(1P12)φiσ(x1)φjτ(x2)|r1r2|×dx1dx2(46) The total energies EHF1 and EHF2 are exactly the same at the values niσ=1 and naσ=0 (i.e. on the feasible domain), only the behaviour outside that domain differs. Optimisation with the same constraint on the occupation numbers being 0 and 1 leads to exactly the same orbitals and orbital energies and the same total energy. Of course, the derivatives with respect to npσ are different (47) EHF2/npσ=2npσϵpσ(47) Obviously, the force of constraint λiσ=2niσϵiσ for occupied orbitals (to keep niσ=1) is larger, since the energy now changes with the square of the occupation number (in the unphysical domain outside niσ=1). The derivative is now larger and then a larger force of constraint is needed to cancel it. The unoccupied orbitals on the other hand now have zero force of constraint since the required zero occupation number is at the minimum of the quadratic (naσ)2 behaviour for aunocc.

It has become popular to move to hybrid functionals with a certain percentage of exact (Hartree–Fock) exchange, in addition to a pure density functional for correlation and the rest of the exchange. When such an energy expression is optimised by orbital variation, as in the traditional Hartree–Fock approach, one will obtain a partly nonlocal potential (a fraction of the Hartree–Fock exchange operator). This procedure is called Hartree–Fock–Kohn–Sham [Citation26] or generalised Kohn–Sham (gKS) [Citation70, Citation71]. Suppose now that we enter occupation numbers in the exact-exchange part of the energy in the same way as in Hartree–Fock, and also write the electron density and the kinetic energy and electron-nuclear energies as linear expressions in the occupation numbers, all as in the energy EHF1. It is elementary to show that in that case the Slater relation Ehybrid1/ni=ϵi also holds (we use again hybrid1 to signal that linear dependence on occupation numbers has been introduced). We recall that the derivation of the Slater relation only uses the chain rule for taking the derivative of the density-dependent part ExcDFA[ρ] and, if present, of the one-particle reduced density matrix (1RDM) part ExcDFA[γ] of the total energy, cf. [Citation3]. This relation has also been called the generalised Janak theorem [Citation71, Citation72]. Assume further that a ground-state calculation is performed, constraining the occupation numbers to 1 and 0. [If we would relieve that constraint and allow fractional occupation numbers, the functional would no longer be distinct from a functional of the one-electron reduced density matrix (1RDM). The exact 1RDM functional has been shown by Gilbert [Citation36] to have complete degeneracy of the orbital energies for all partially occupied orbitals.] If we now consider the derivative with respect to the occupation number of the LUMO, we are dealing with an infinitesimal increase of the total number of electrons. This derivative with respect to occupation number then appears to be the same as the derivative with respect to total electron number, Ehybrid1/nL=(E/N)+ [Citation71, Citation73]. With the straight-line energy, the latter is A. This has led to the conclusion that the LUMO orbital energy for an accurate hybrid functional must be close to A. However, this is questionable. In the first place, the derivative E/N is not defined in exact DFT, as discussed in Section 2. It is only A with a particular choice of the energy behaviour for noninteger N, namely the straight-line energy E~(N~), Equation (Equation33). In the second place, the derivative with respect to occupation number depends on the way occupation numbers are introduced in a DFA. The energy at fractional occupation numbers is a mathematical device, it does not correspond to a true physical system and therefore using it to infer properties of the actual physical system (at integer electron number) is questionable. A LUMO close to A is typical for the Hartree–Fock model and can for a hybrid be obtained with a very high percentage of exact exchange [Citation15]. For common hybrids, with exact exchange percentages in the order of 20–30%, the LUMO is much below A [Citation3]. It is remarkable that in solids (but not in molecules) the ϵLDFA(solid) is equal to the DFA calculated (total energy difference) ADFA(solid) for DFAs that obey the Slater relation [Citation72]. This difference between solids and molecules is not self-evident. It requires for its proof that the ‘solid state limit’ to arbitrarily large size of the system can be taken. This is discussed at some length in Ref. [Citation3], see also Ref. [Citation72].

6. Summary

We have applied the theory of functional derivatives with constraints and the Lagrange multiplier technique to the problem of optimisation of the density in the Hohenberg–Kohn functional with the ρdr=N constraint. We have highlighted the arbitrariness of a part of the total functional derivative, namely the derivative of the total energy with respect to electron number, E/N. It stems from the restricted domain on which F[ρ] is defined. The Hohenberg–Kohn functional F[ρ] (and the Levy–Lieb and Lieb functionals) are only defined on the domain of N-electron densities. As a corollary, Janak's theorem does not exist in (exact) Kohn–Sham DFT. The essential difficulty is with the nonexistence of fractional electron systems. Any change of an orbital occupation number from integer value (1 or 0) entails a change of electron number from the integer value. There are no wavefunctions for such systems, not for interacting electrons and not for the noninteracting electrons of the Kohn–Sham system. Therefore, there are no exact energies and no exchange–correlation energies Exc for noninteger N (at least not for single, finite electron systems (atoms and molecules) at T=0).

For application of the Euler–Lagrange equation to find the optimum N-electron density ρ0N for which the Hohenberg–Kohn energy functional Ev[ρ] attains its minimum value, the arbitrariness of E/N can be solved in two ways. The functional derivative δEv[ρ]/δρ(r) should be taken with the constraint ρ(r)dr=N (N integer). The constraint derivative δEv[ρ]/δNρ(r) should be zero at the ground-state density ρ0N. Alternative one can make a suitable choice of the E/N derivative and apply the Lagrange multiplier technique. The choice should be for a continuous and finite derivative, otherwise it would impede the application of the Lagrange multiplier technique for density optimisation. Choices in the literature, such as FL[ρ]=+ in the functional analysis literature of DFT [Citation8–11, Citation24] or a discontinuous derivative (different derivatives to the electron-rich and electron-deficient sides of the integer), are not suitable in this sense.

It is important that we are dealing here with single quantum systems at T=0 (for which the majority of DFT calculations in chemistry are being done). Perdew et al. [Citation6] have studied the statistical mechanics of systems with fluctuating electron number with a constrained search over ensembles of integer electron density matrices, leading to linear energy behaviour in between integers for the average energy at average electron number. It is an altogether different matter to define the energy of a single molecule with a noninteger electron number at T=0 by a mixture of density matrices of ground states with different electron numbers, see Equations (Equation32) and (Equation33). That procedure represents just one possible extension of the energy functional into the nonphysical fractional electron density domain. Such an extension is essentially arbitrary, cf. Section 2. The linear choice, which we have denoted E~(N~), has the interesting property that it leads to the proper dissociation of molecules into neutral atoms, see Section 4.3. We have also emphasised that such success is achieved by treating E~(N~) as a local functional, applicable in the local domains of the densities of the fragments separately. This at the same time prevents such a functional from being ‘exact’ because fully correct dissociation requires a nonlocal functional, see Section 4.4. In order to obtain the proper KS wavefunction for a heteronuclear dissociated diatomic molecule XY with I(X)>I(Y), the derivative discontinuity of the potential, which is an inherent property of E~(N~), and which would be I(X)A(X) on atom X (i.e. only dependent on X), is wrong; because of nonlocality effects it should be I(X)I(Y), i.e. be dependent on both X and Y.

Derivatives of the energy with respect to occupation numbers do not exist in exact KS DFT. However, if occupation numbers are introduced as additional variables in the total energy such derivatives can be defined for approximate energy expressions, such as EHF1 (Equation41) and EHF2 (Equation46), and similarly for Xα, LDA, GGA and hybrid functionals. At fractional values of these occupation numbers, one obtains mathematically defined energies, although without physical meaning. For such energies, Slater had already obtained the relation between derivative with respect to orbital occupation number and orbital energy which bears his name and which Janak has tried to extend to (exact) DFT. The Slater relation is particularly useful when total energies are hard to calculate but orbital energies are readily available [Citation3]. These results do not imply that an extension to exact Kohn–Sham DFT is possible. In order to avoid confusion, application of the Slater relation with the approximate DFT energy expressions mentioned above should be denoted as such, and not be called application of Janak's theorem. Janak's theorem is an attempt to extend Slater's relation to exact Kohn–Sham DFT, which has been questioned in Section 3.

Acknowledgements

I am grateful to Paola Gori-Giorgi and Andreas Savin for helpful discussions and keen interest.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

Appendix 1.

N-conserving functional derivatives

A few aspects of constraints on functional derivatives are reviewed. We refer to Gál [Citation74, Citation75] for detailed discussion. In the present case, the constraint of conservation of particle number, ρ(r)dr=N is specifically addressed. We follow Parr and Bartolotti [Citation19] in separating the N-dependency and shape dependency of the density by writing it as N times a shape function σ(r) which integrates to 1, (48) ρ(r)=Ng(r)g(r)dr=Nσ(r);σ(r)dr=1δρ(r)=Nδσ(r)+σ(r)δN=δNρ(r)+δσρ(r)δNρ(r)dr=0,δσρ(r)dr=σ(r)drδN=δN(48) where g(r) is defined up to a scaling constant; it can be g(r)=ρ(r). Subscripts N and σ denote constant electron number N and constant shape function σ(r), respectively. The constrained derivatives for functionals are analogous to partial derivatives for functions. δσ should be orthogonal to σ otherwise the Nδσ term contains part of the electron number variation, which should follow from σδN (if δσ has a component aσ along σ, Nδσdr=Naσdr=Na).

Define the changes δNE[ρ] of the functional E[ρ] as those variations of the energy that are induced by N-conserving density changes δNρ(r), (ρ(r)+δNρ(r))dr=N, (A2) δNE[ρ]δEδρ(r)δNρ(r)dr.(A2) Application of the Lagrange multiplier technique requires variation of E to be defined for arbitrary variations δρ, so if we want to break down the total derivative δE/δρ into norm-conserving and shape-conserving parts, these have to be defined as operators acting on arbitrary δρ, (A3) δEδρδρdr=δEδNρ(r)+δEδσρ(r)δρ(r)dr(A3) The constraint derivative δE[ρ]/δNρ(r) is defined as that operator (kernel of an integral operator) that delivers δNE from a general δρ, by picking out the δNρ(r) part of δρ(r) (51) δNE=δEδNρ(r)δρ(r)dr=δEδNρ(r)(δNρ(r)+δσρ(r))dr=δEδNρ(r)δNρ(r)dr+δEδNρ(r)δσρ(r)dr=δEδNρ(r)δNρ(r)dr+0.(51) The fact that the last term on the second line is zero is an essential element in the definition of δE/δNρ – the total δNE stems from and is determined only by the N-conserving part of δρ. The shape-conserving part, which only allows a change of N, cannot contribute to δNE. This is an important point. In the literature [Citation19, Citation26–28], it has been an issue that δE[ρ]/δNρ(r) would be ambiguous, containing an undefined constant. One must then be careful to note that this stems from restricting the domain on which the derivative is defined to the set of N-conserving density changes δNρ(r) and the observation that CδNρ(r)dr=0 [Citation9, Citation20]. However, in the application of the Lagrange multiplier method (see text), we need derivatives (operators) that are defined for action on an arbitrary δρ, not just on the domain of N-conserving δNρ. So ‘orthogonality’ of δE/δNρ(r) on δσρ(r), (δE/δNρ(r))δσρ(r)dr=0, is a necessary part of the definition of δE/δNρ(r). This property would be destroyed by an additional constant C:Cδσρ(r)dr=CδN. A constant would give δE/δNρ(r) an unwarranted component in the space ‘orthogonal’ to the set {δNρ(r)}. This is analogous to adding to //f in Figure  a component Cn in the direction n perpendicular to //f, so it would clutter df=(//f+Cn)¢erdotδl for a general displacement δl with contributions Cn¢erdotδl from the perpendicular component δl.

In the same way, for a complete definition of δE/δσρ we must have (δE/δσρ(r))δNρ(r)dr=0. Now for arbitrary δρ (A5) δEδρ(r)δρ(r)dr=δEδρ(r)δNρ(r)dr+δEδρ(r)δσρ(r)dr=δEδNρ(r)δNρ(r)dr+δEδσρ(r)δσρ(r)dr=δEδNρ(r)δρ(r)dr+δEδσρ(r)δρ(r)dr=δEδNρ(r)+δEδσρ(r)δρ(r)dr,(A5) consistent with the identity (A6) δEδρ(r)=δEδNρ(r)+δEδσρ(r).(A6) Now a derivative with constant shape function σ can only involve a change of N, so we expect (A7) δEδσρ(r)=EN(A7) and with the chain rule (A8) EN=δEδρ(r)ρ(r)Ndr=δEδρ(r)ρ(r)Ndr.(A8) A derivation can be given for δE[ρ]/δNρ(r) employing the factorisation of ρ(r) as Ng(r)/g(r)dr (see [Citation20]) which leads to (A9) δE[ρ]δNρ(r)=δE[ρ]δρ(r)δE[ρ]δρ(r)ρ(r)Ndr,(A9) in perfect agreement with Equation (EquationA8), so that with Equations (EquationA6)–(EquationA9) (A10) δE[ρ]δρ(r)=δE[ρ]δNρ(r)+EN.(A10) Note that indeed (δE/δσρ(r))δNρ(r)dr=(E/N)δNρ(r)dr=(E/N)Nδσ(r)dr=0. The fact that E/N is not defined by the Hohenberg–Kohn theorem, is not a problem in DFT, since only the energy changes δNE[ρ] are required. The fact that any constant C has no effect when acting on a feasible δNρ(r) makes sure that the essential change δNE[ρ] is defined whatever the (chosen) value of E/N: (A11) δE[ρ]δρ(r)δNρ(r)=δE[ρ]δNρ(r)+ENδNρ(r)=δE[ρ]δNρ(r)δNρ(r)=δNE[ρ].(A11)

Appendix 2.

An arbitrary constant in the constrained derivative?

Starting with the contribution by Parr and Bartolotti [Citation19] remarks have been made in the literature that the constrained derivative (A12) δE[ρ]δρ(r)N(A12) which we have written for the sake of concise notation as δE[ρ]/δNρ(r), following Gál [Citation20], would only be defined up to an arbitrary constant. The origin is the same pitfall we mentioned in Section 2. Parr and Yang [Citation26, p. 83] write Equation (Equation11) with an N-conserving derivative (A13) δE[ρ]δg(r)=δE[ρ]δρ(r)Nδρ(r)δg(r)dr=0(A13) where [δE[ρ]/δρ(r)]N is ‘a functional differentiation with N fixed’ (what we denote δE/δNρ(r)). Application of the chain rule means that Equation (EquationA13) should in principle be written with the full derivative δE[ρ]/δρ(r), as has been done in Equation (Equation11), but since δρ(r)/δg(r) will deliver N-conserving variations δNρ(r), the use of the constraint derivative in this case is also correct. Using Equations (Equation12) and (EquationA13) at the solution point (ρ0N), one then obtains Equation (4.4.7) of [Citation26, p. 83], (A14) δE[ρ0N]δρ(r)N=δE[ρ0N]δρ(r)Ng(r)g(r)drdr(A14) from which it was concluded that [δE[ρ0N]/δρ(r)]N must be an undefined constant (the constant cannot be fixed from (EquationA14)). However, we have seen in Section 2 that at the solution point this N-conserving derivative is not an arbitrary constant (the pitfall mentioned there) but is zero. There is not a problem with Equation (EquationA14), it is correct, but one should be careful with conclusions drawn from it. It is possible to conclude that the N-conserving derivative at the solution point is a constant, but it may be that on other grounds it is known that this constant is not arbitrary but has a specific value. The constrained derivative has been found to be zero at the solution point ρ0N, see Section 2. That is of course perfectly consistent with (EquationA14). If on the other hand the full derivative would have been used in Equation (EquationA13), as we did in Equation (Equation11), this would also have emerged in Equation (EquationA14). We have noted in Section 2 that it is this full derivative that is equal to a constant at ρ0N, which is again compatible with equation (EquationA14) (for the full derivative δE[ρ0N]/δρ(r) instead of the constrained δE[ρ0N]/δNρ(r)). As we have seen, the constant in that case is indeed arbitrary, being E[ρ0N]/N. This difference – the constrained derivative δE[ρ0N]/δNρ(r) being zero according to our present analysis, see discussion below Equation (Equation13), but equal to an undefined constant according to Ref. [Citation26] – goes back to Parr and Bartolotti [Citation19], who first proposed there to be an arbitrary constant in [δE[ρ0N]/δρ(r)]N, which has since often surfaced in the DFT literature.