![MathJax Logo](/templates/jsp/_style2/_tandf/pb2/images/math-jax.gif)
ABSTRACT
We give an overview of our formalizations in the proof assistant Isabelle/HOL of certain irrationality and transcendence criteria for infinite series from three different research papers: by Erdős and Straus, Hančl, and Hančl and Rucki. Our formalizations in Isabelle/HOL can be found on the Archive of Formal Proofs. Here we describe selected aspects of the formalization and discuss what this reveals about the use and potential of Isabelle/HOL in formalizing modern mathematical research, particularly in these parts of number theory and analysis.
1 Introduction
The libraries of formalized mathematics in various proof assistants (or interactive theorem provers) have been growing for years. However, the field is still far from the goal propounded in 1994 as the QED Manifesto [Citation5] “to build a computer system that effectively represents all important mathematical knowledge and techniques.” We are light years from the vision of having an interactive tool that could “converse” with human mathematicians to aid in the discovery of new results, as colorfully described by Timothy Gowers [Citation17]. Our goal within the ALEXANDRIA Project at Cambridge [Citation37] is to contribute to making proof assistants (and more specifically Isabelle/HOL) more useful to research mathematicians. To gather information about what specifically needs to be done, we have been formalizing material to contribute to the Isabelle/HOL LibrariesFootnote1 and the Archive of Formal Proofs (AFP).Footnote2 As research mathematics is our main target, and given that a considerable amount of basic mathematics has already been formalized in Isabelle/HOL over the years, here we wish to explore the following question:
How receptive to formalization (particularly in Isabelle/HOL) is mainstream published mathematical research?
In this paper we are not working towards the ambitious goal of formalizing the proof of a landmark result. Nor are we concerned with standard material from undergraduate or graduate textbooks. Those proofs have been thoroughly studied and checked, and repeatedly reworked and polished by many people over the years. Instead, we are formalizing mainstream mathematical research from journal papers in their raw form, not reworked or clarified as textbook proofs normally are. To explore the question above (within a specific case study), we ask questions such as these:
To what extent does the current Isabelle/HOL library cover the necessary preliminaries?
How helpful is the automatic proof tools?
How much longer than the original proofs are the formalized proofs (i.e. the de Bruijn factor [Citation40])?
Are we going to discover any issues or gaps in the proofs?
Clearly, the answers to these questions depend on the chosen proofs. Here, we attempt some case studies in number theory, on transcendence and irrationality criteria for infinite series. In particular, we choose several results from three research papers by Erdős and Straus (1974) [Citation13], Hančl (2002) [Citation23], and Hančl and Rucki (2005) [Citation24]. This line of research makes heavy use of standard background material from analysis, e.g., convergence tests for series and (for Erdős and Straus [Citation13]) certain results about prime numbers. The material from Hančl and Rucki [Citation24] applies Roth’s theorem on rational approximations to algebraic numbers.Footnote3 Inevitably, we had to pick topics in which we a priori knew that at least some of the preliminaries were already available in Isabelle/HOL. The material we chose is mathematically interesting in its own right, and moreover—unlike most results formalized nowadays—it comes from fairly recent research papers instead of textbooks.
Paul Erdős was keenly interested in establishing criteria for the irrationality and transcendence of infinite series [Citation11, Citation12, Citation14]. Several techniques for showing the transcendence of series are given by Nishioka [Citation33], who presents, among other, methods originated by Mahler [Citation31]. Later, Sándor [Citation39], Nyblom [Citation34, Citation35], and Hančl [Citation20–22] obtained many more results in this area.
In this article, we merely present in standard mathematical language the statements (without the informal proofs) of the theorems that we formalized and afterwords we discuss our formalization in Isabelle/HOL focussing on its most interesting aspects. For the informal proofs, the reader may refer to the respective papers [Citation13, Citation23, Citation24]. Our full Isabelle/HOL formalizations can be found online, on the AFP [Citation28–30]. As we had to fill in many intermediate arguments that were implicit in the original proofs, providing more details and clarifications, our formalizations may serve not only as case studies to explore the question of how receptive to formalization in Isabelle/HOL mathematical research is and as proof verification, but also as helpful supplementary material for a detailed study of the original proofs.
The plan of this article is as follows: in the next section we give a brief introduction to Isabelle/HOL. In Sections 3–5, we present the material formalized and discuss the main challenges and lessons learned through our formalizations for each article, respectively. In Section 6, we briefly discuss the de Bruijn factors of the formalizations and finally, in Section 7 we present our conclusion.
2 Isabelle/HOL
IsabelleFootnote4 is an interactive theorem prover first developed in the 1980s by Lawrence Paulson and Tobias Nipkow [Citation32, Citation36]. Today, Isabelle provides the Isar language for writing proofs. Isar proofs are hierarchically structured, and they include enough redundancy—consisting of explicitly stated assumptions and goals—to be understandable to humans as well as machines. Isabelle supports multiple different logical formalisms, with Isabelle/HOL specifically based on higher order logic. Unlike proof assistants based on constructive type theories with dependent types, it implements simple type theory. Isabelle/HOL admits classical (non-constructive) proofs, i.e. it accepts the law of the excluded middle, , as well as the axiom of choice.
Simple type theory and classical logic allow for powerful automation, which is a significant advantage. Isabelle/HOL’s main automation tool is Sledgehammer [Citation2], which searches for proofs by calling external automated theorem provers, and then converts the proofs discovered to Isar. Two counterexample finding tools, nitpick and Quickcheck, are also provided. Moreover, the command try0 calls a number of built-in proof methods (such as simplification) in search of a proof. The user interface is Isabelle/jEdit, which provides real-time proof checking, as well as rich semantic information for the formal text and direct links to the user manuals and tutorials.Footnote5 We have written an introduction to Isabelle aimed at mathematicians [Citation27].
The Isabelle/HOL libraries and the AFP contain a vast amount of formalized material, covering many areas of pure mathematics as well as in computer science, logic and even philosophy.
3 On the irrationality of certain series
3.1 Results formalized
The following results are by Erdős and Straus [Citation13].
Theorem 3.1
(Erdős and Straus [13, Theorem 2.1]). Let be a sequence of integers and
a sequence of positive integers with
for all large n and
Then the sum
is rational if and only if there exists a positive integer B and a sequence of integers
so that for all large n,
Corollary 3.1 (Erdős and Straus [13, Corollary 2.10]). Let and
satisfy the hypotheses of the theorem above and in addition that for all large n we have
and
. Then the sum
is irrational.
Theorem 3.2
(Erdős and Straus [13, Theorem 3.1]).
Let pn be the nth prime number and let be a monotonic sequence of positive integers satisfying
and
. Then the sum
is irrational.
3.2 On the formalization
Our full formalization is online [Citation30] and here we discuss certain key points. The formalization process turned out to be time-consuming: we had to fill in a number of intermediate reasoning steps that had to be made explicit in the formal proofs. For example, in the proof of Theorem 3.2 ([13, Theorem 3.1]), for the sequence of integers (for which
for large n), Erdős and Straus claim that since it is unbounded, for large n there must exist an index
so that
. In order to prove this claim, we had first to show that for large n the sequence
is neither monotone increasing nor monotone decreasing, which required some work.
We moreover had to do a certain amount of restructuring for the formalization of the proof of Theorem 3.1 ([13, Theorem 2.1]), as the original proof is based on a sketch of a construction for obtaining the terms of the desired sequence , starting with the construction of the first two terms for large enough n (“Proceeding in this manner we get the desired sequence”). In particular, in the sufficient direction, we need to obtain a positive integer B and a sequence of integers
that satisfy
and
for all large enough n, where
and
are sequences of integers as in the assumptions of the theorem. In the middle of the proof, we will have for a large enough N
(3.1)
(3.1) where
is a sequence such that for all
and
(3.2)
(3.2)
To construct a suitable , we let cN be the integer nearest to
and inductively construct
(for all
). This was achieved by defining a recursive function in Isabelle/HOL:
fun get_c::"(nat
int)
(nat
int)
int
nat
(nat
int)" where
"get_c a’b’ B N 0 = round (B * b’ N/a’ N)" |
"get_c a’ b’ B N (Suc n) = get_c a’ b’ B N n * a’ (n + N)
- B * b’ (n + N)"
where round returns the nearest integer of an input number. By construction, we already have so the goal is to show
(for all
). To achieve this, we deploy proof by induction to derive
where (3.1) has been used for the base case and (3.2) for the inductive one. We can then close the proof by having
The original proof repeatedly invokes variants of (3.1) and tries to prove cn is the closest integer to for all
(e.g., (2.7)-(2.9) in the original proof). We believe our altered proof might be more straightforward.
We also noted that in the original proofs of Corollary 3.1 ([13, Corollary 2.10]) and Theorem 3.2 ([13, Theorem 3.1]) certain inequalities required some minor corrections (for details we refer to our formalization [Citation30]), which fortunately did not affect the correctness of the statements nor the original proof structures.
Our formalized versions of the statements of Theorem 3.1 ([13, Theorem 2.1]), Corollary 3.1 ([13, Corollary 2.10]), and Theorem 3.2 ([13, Theorem 3.1]) read as follows:
theorem theorem_2_1_Erdos_Straus:
fixes a b:: "nat⇒int"
assumes "∀ n. a n >0" and "∀F n in sequentially. a n > 1"
and "(λn. |b n|/(a (n-1)*a n))→ 0"
shows "(Σn. (b n/(π i ≤ n. a i))) ∈ Q $
(∃ (B::int)>0. ∃ c::nat⇒ int.
(∀F n in sequentially. B*b n = c n * a n - c(n + 1)
∧|c(n + 1)|<a n/2)) "
corollary corollary_2_10_Erdos_Straus:
fixes a b:: "nat⇒int"
assumes "∀ n. a n >0" and "∀F n in sequentially. a n > 1"
and "(λn. |b n|/(a (n-1)*a n)) → 0"
and "∀F n in sequentially. b n > 0 ∧ a (n + 1) ≥ a n"
and "lim (λn. (b(n + 1) - b n)/a n) ≤ 0"
and "convergent (λn. (b(n + 1) - b n)/a n)"
and "liminf (λn. a n/b n) = 0 "
shows "(Σn. (b n/(π i ≤ n. a i)))/ ∉ Q"
theorem theorem_3_10_Erdos_Straus:
fixes a::"nat ⇒ int"
assumes "∀ n. a n >0" and "mono a"
and "(λn. nth_prime n/(a n)ˆ2) → 0"
and "liminf (λn. a n/nth_prime n) = 0"
shows "(Σn. (nth _prime n/(π i ≤ n. a i)))/ ∉ Q "
Above, sequentially is a filter [Citation25] for expressing limits indexed by positive integers. So n in sequentially. P n expresses that P n holds for all sufficiently large n.
3.2.1 Calculations with real asymptotics
Asymptotic arguments (e.g., limits and statements of the form “for all sufficiently large n”) present real difficulties for mechanized proofs. Justifications omitted as obvious in paper proofs need to be written out. For example, in the proof of Theorem 3.2 ([13, Theorem 3.1]) we need to formally prove(3.3)
(3.3) where B is a positive constant. Since the proof of (3.3) involves routine calculations of limits, this step is completely omitted in the paper proof. In Isabelle/HOL, (3.3) is encoded as
"∀F n in sequentially. 8*B*ln n + 1 < sqrt n/4"
Our first attempt resulted in the following proof:
have "∀F n in sequentially. 8*B*ln n + 1 < sqrt n/4"
proof -
have "(λn. ln n/sqrt n) → 0"
using lim_ln_over_power[of "1/2"] powr_half_sqrt by simp
from tendsto_mult_right_zero[OF this]
have "(λn. 8 * B * ln n/sqrt n) → 0" by auto
moreover have "(λn. 1/sqrt n) → 0"
apply (rule real_tendsto_divide_at_top)
by (auto simp add: filterlim_of_nat_at_top sqrt_at_top)
ultimately have "(λn. 8*B*ln n/sqrt n + 1/sqrt n) → 0"
by (auto intro:tendsto_add_zero)
from tendstoD[OF this, of "1/4"]
have "∀F n in sequentially. 8*B*ln n/sqrt n + 1/sqrt n < 1/4"
unfolding abs_less_iff dist_real_def
by (auto elim!:eventually_mono)
then show? thesis
using eventually_gt_at_top[of 0]
by (eventually_elim, auto simp:field_simps)
qed
The proof is tedious. Let us work through the text above. We start with primitive properties of the ln and square root functions,written” (λ n. ln n/sqrt n) → 0” and” (λ n. 1/sqrt n) → 0” in the script. Then we need to manually transform a property of limits,
to the proposition that
by applying the lemma tendstoD:
Fortunately, Manuel Eberl [Citation8] had just implemented the tactic real_asymp that solves (3.3) automatically, so we can reduce our initial proof to just one line:
have "∀F n in sequentially. 8*B*ln n + 1 < sqrt n/4"
by real_asymp
The basic idea behind real_asymp is to approximate a continuous function when its argument approaches a limit L using Poincaré expansion (or asymptotic series):
where
and
is a sequence of continuous functions such that
for all n as
.Footnote6 Through arithmetic on the formal series, the asymptotic behavior of a function at L can then be decided. In general, real_asymp handles functions built from basic arithmetic and elementary functions including exp, ln, and sin.
In situations involving limits, due to the presence of bound variables, Sledgehammer’s automation helps little, so real_asymp makes a real contribution. Such domain-specific automation inspired from computer algebra systems will surely prove to be of great use when mechanising mathematical proofs.
But in general we have found Sledgehammer to be of tremendous value. A telltale sign that it has been used is any occurrence of the proof methods metis or smt, which are littered throughout our formal proofs [28–30]. These methods are difficult to use manually, and any instances of them are almost certain to have been generated by Sledgehammer.
3.2.2 Reasoning with prime numbers
Elementary properties concerning prime numbers are often assumed in informal proofs, with supporting calculations largely omitted. Thanks to the mechanised proof of the prime number theorem (PNT) [Citation10] as well as another development on primes [Citation9] from the AFP, basic reasoning with prime numbers is feasible. However, there are still some tedious, repetitious calculations, which could be optimised in the future. For example, the first step in proving Theorem 3.2 ([13, Theorem 3.1]) is to demonstrate that the hypotheses of Theorem 3.1 ([13, Theorem 2.1]) are satisfied by the former’s assumptions. Although this step has been completely omitted in the informal proof, we need to explicitly derive from
, where pn is the
prime number and
is a monotonic sequence of positive integers. The derivation works as follows:
The second inequality requires a basic property of the prime numbers:(3.4)
(3.4)
Starting with a form of the PNT, namely , this property alone takes 28 lines of tedious limit calculations similar to those in our first attempt to prove (3.3). Another example from the proof of Theorem 3.2 ([13, Theorem 3.1]) is the following: considering the interval
for large enough N and given that
implies
, we need to deduce that
happens for fewer than half of the integers within
. The derivation works because
(3.5)
(3.5) and
when
and N is large enough. Property (3.5) is formulated in Isabelle/HOL as
lemma nth_prime_double_sqrt_less:
assumes "ε > 0"
shows "∀ F n in sequentially. (nth_prime (2*n) - nth_prime n)
/sqrt (nth_prime n) < n powr (1/2 + ε)"
where we remind that n in sequentially stands for the “large enough” quantifier. Despite (3.5) being merely claimed without proof in the paper, it took us 56 lines to prove it in Isabelle/HOL starting from the PNT. In short, we are happy that properties related to the prime numbers are now viable in Isabelle/HOL, but we wish we could derive their asymptotic properties like (3.4) and (3.5) with fewer tears. Automating this process through the tactic real_asymp is desirable but the tactic does not yet support the function nth_prime: in contrast to elementary functions like exp, nth_prime cannot be asymptotically expanded to an arbitrary order. Incorporating nth_prime into real_asymp may require special treatment.
4 Irrational rapidly convergent series
4.1 Results formalized
The following results are by Hančl [Citation23].
Theorem 4.1
(Hančl [23, Theorem 3]). Let A > 1 be a real number. Let be a sequence of real numbers greater than one. Let
and
be sequences of positive integers such that
and for all sufficiently large n
Then the sum
is irrational.
Corollary 4.1 (Hančl [23, Corollary 2]). Let A > 1 and let and
be sequences of positive integers such that
. Then, assuming that for every sufficiently large positive integer n,
and
, the sum
is irrational.
This corollary follows directly from the theorem by an appropriate choice of the sequence , in particular by choosing
so that the theorem can be directly applied.
4.2 On the formalization
Our full formalization is online on the AFP [Citation28]. This was the first work in this area that we formalized, starting in early 2018. We chose this research paper as the proofs are interesting but not too complicated, and even though this is a topic in number theory, they are based on fundamental material from analysis. Of course, we also had to show quite a few auxiliary facts, including lemmas on summability criteria for series. For instance, we modified the classic ratio test for series (which was already available in the HOL Library):
lemma summable_ratio_test:
fixes c:: real and N:: nat
and f:: "nat ⇒ ’a:: real_normed_vector"
assumes "c < 1" and "∀n ≥ N. norm (f (Suc n)) ≤ c * norm (f n)"
shows "summable f"
The version modified in terms of limits is given below:
lemma summable_ratio_test_tendsto:
fixes c:: real and f:: "nat ⇒ ’a:: real_normed_vector"
assumes "c < 1" and "∀n. f n _= 0"
and tendsto_c: "(λn. norm (f (Suc n))/norm (f n)) → c"
shows "summable f"
proof -
from hc < 1i tendsto_c
obtain N where N_dist:
"∀n ≥ N. dist (norm (f (Suc n))/norm (f n)) c < (1-c)/2"
unfolding tendsto_iff eventually_sequentially
by (meson diff_gt_0_iff_gt zero_less_divide_iff zero_less_numeral)
have "norm (f (Suc n))/norm (f n) ≤ (1 + c)/2" when "n ≥ N" for n
using N_dist[rule_format,OF hn ≥ Ni] hc < 1i
by (auto simp add:field_simps dist_norm,argo)
then have "norm (f (Suc n)) ≤ (1 + c)/2 * norm (f n)"
when "n ≥ N" for n
using hn ≥ Ni h∀n. f n ≠ 0i by (auto simp add:divide_simps)
moreover have "(1 + c)/2 < 1" using hc < 1i by auto
ultimately show "summable f"
using summable_ratio_test by blast
qed
Our formalized version of Theorem 4.1 ([23, Theorem 3]) is presented below. We include its formalized proof in the Appendix. The reader will notice that the formal proof is only around 50 lines of Isar code; however, it comes after about 800 lines of auxiliary lemmas that are required for this specific proof. As in the original—and after having established the summability of the series in question—the proof is by contradiction. Assuming that the sum is rational, for a quantity we show that
for all
. After that, we show the existence of an
for which
, a contradiction: hence, the sum of the series is irrational. As already mentioned, Isar admits structured proofs, which are more readable than the usual sequence of proof commands. In our proof, the structure of a proof by contradiction is visible:
proof (rule ccontr) […] show False […] qed.
theorem Hancl3:
fixes d::"nat⇒real" and a b:: "nat⇒int"
assumes "A > 1" and d: "∀n. d n > 1"
and a: "∀n. a n > 0" and b: "∀n. b n > 0" and "s > 0"
and "(λn. (a n) powr(1/of_int(2ˆn))) → A"
and "∀n ≥ s. A/(a n) powr (1/of_int(2ˆn)) > (πj. d (n + j))"
and "LIM n sequentially. d n ˆ 2 ˆ n/b n:> at_top"
and "convergent_prod d"
shows "(Σn. b n/a n)/∉ Q
Our formalized version of Corollary 4.1 ([23, Corollary 2]) is as follows:
corollary Hancl3corollary:
fixes A::real and a b:: "nat⇒int"
assumes "A > 1" and a: "∀n. a n > 0" and b: "∀n. b n > 0"
and "(λn. (a n) powr(1/of_int(2ˆn))) → A"
and "∀n ≥ 6. a n powr(1/of_int (2ˆn)) * (1 + 4*(2/3)ˆn) ≤ A
∧ b n ≤ 2 powr (4/3)ˆ(n-1)"
shows "(Σn. b n/a n) ∉ Q "
It is interesting to note that while in Corollary 4.1 ([23, Corollary 2]) we have an assumption that holds “for all sufficiently large positive integer n”, in our formalized version above this assumption is actually specified as “for all ”. In our formalized version of Theorem 4.1 ([23, Theorem 3]) above, “for all sufficiently large n” had been written simply as “
” where s > 0 is some fixed, unknown number. Statements of the form “for all sufficiently large” can also be encoded in a more abstract fashion using the keyword eventually, which does not require an additional variable.
4.2.1 Infinite products
This material requires the notion of an infinite product, which was then (early 2018) not available in Isabelle except in a small development by Manuel Eberl. Infinite sums were available, and because we were only dealing with positive reals, the infinite products could be reduced to infinite sums via logarithms. However, since one objective of this work was to identify and fill gaps in the libraries, we decided to formalize infinite products directly. We extended Eberl’s development to a comprehensive Infinite_Products library and included it in the next Isabelle release (Isabelle 2018).
Consider the following technical lemma, where we make use of :
lemma show8:
fixes d::"nat⇒real " and a::"nat⇒int" and s k::nat
assumes "A > 1" and d: "∀n. d n> 1" and "∀n. a n > 0" and "s > 0"
and "convergent_prod d"
and "∀n ≥ s. A/(a n) powr(1/(2::int)ˆn) > (πj. d(n + j))"
shows "∀n ≥ s. (πj. d (j + n)) < A
/(MAX j∈{s.n}. a j powr (1/(2::int) ˆ j))"
This had been originally formulated using ∑ via the natural logarithm in the first version, so the conclusion instead read
shows "∀n ≥ s. exp (Σj. ln(d (j + n))) < A
/(MAX j∈{s.n}. a j powr (1/(2::int) ˆ j))"
By convention, infinite products are defined in two stages [1, p. 241].
For nonzero complex numbers, the infinite product
is defined to converge if the finite partial products converge to a nonzero limit. If that limit is zero, then the infinite product diverges to zero.
If finitely many of the uk equal zero and the infinite product of the nonzero terms converges in the sense of (1), then
converges to zero.
One advantage of this approach is that and similar identities hold.
5 The transcendence of certain infinite series
5.1 Results formalized
The following results are by Hančl and Rucki in [Citation24].
Theorem 5.1
(Hančl and Rucki [24, Theorem 2.1]). Let δ be a positive real number. Let and
be sequences of positive integers such that
and
Then the sumis transcendental.
Theorem 5.2
(Hančl and Rucki [24, Theorem 2.2]). Let δ and be positive real numbers. Let
and
be sequences of positive integers such that
and for every sufficiently large n
Then the sum
is transcendental.
The two theorems above are in fact corollaries of Roth’s celebrated result on rational approximations to algebraic numbers from 1955:
Theorem 5.3
(Roth [Citation38]). Let a be any algebraic number, not rational. If the inequality
has infinitely many solutions in coprime integers p and q where q > 0, then
.
In particular, the two theorems above by Hančl and Rucki [Citation24] asserting the transcendence of the sum of certain series are shown by finding infinitely many such integer solutions of the inequality with a the sum of the series in question and for some .
5.2 On the formalization
Our full formalization is online [Citation29].Footnote7 In this work, too, we had to fill in certain arguments that had been omitted in the original paper, for example to first show that the series in question were summable.
5.2.1 Implementing Roth’s Theorem
As already mentioned, for the proofs of Theorems 5.1 ([24, Theorem 2.1]) and 5.2 ([24, Theorem 2.2]), we applied Theorem 5.3: Roth’s theorem on rational approximations to algebraic numbers [Citation38]. The proof of this celebrated theorem is long and elaborate; it has not been formalized in any proof assistant, to our knowledge. In our formalization, we have thus used the statement of Roth’s theorem merely as an assumption, instead of formalizing Roth’s proof itself beforehand. Roth’s theorem was implemented within a locale as follows:
locale RothsTheorem =
assumes RothsTheorem:
"∀ξ κ. algebraic ξ∧ ξ/∉ Q ∧ infinite {(p ,q ). (q::int)>0∧
coprime p q ∧ |ξ - p/q | < 1/q powr κ} → κ ≤ 2"
A locale collects parameters and assumptions, which it packages as a context in which to work. This locale simply packages the assumption that Roth’s theorem is true. It is a safer option than assuming the theorem as an axiom. Theorems 5.1 ([24, Theorem 2.1]) and 5.2 ([24, Theorem 2.2]) were then formalized within this locale:
theorem (in RothsTheorem) HanclRucki1:
fixes a b:: "nat ⇒ int" and δ:: real
assumes "∀k. a k > 0" and "∀k. b k > 0" and "δ > 0"
and "limsup (λk. a(k + 1)/
(πi = 0.k. a i) powr (2 + δ) * (1/b(k + 1))) = ∞"
and "liminf (λk. a (k + 1)/a k * b k/b (k + 1)) > 1"
shows "¬ algebraic (Σk. b k/a k)"
theorem (in RothsTheorem) HanclRucki2:
fixes a b:: "nat⇒int" and δ ε:: real and t:: nat
assumes "∀k. a k > 0" and "∀k. b k > 0" and "δ > 0" "ε > 0"
and "limsup (λk. a(k + 1)/(πi = 0.k. a i) powr (2 + 2/ε + δ)
* (1/b(k + 1))) = ∞"
and "∀k ≥ t. (a (k + 1)/b (k + 1)) powr (1/(1 + ε))
≥ (a k/b k) powr (1/(1 + ε)) + 1"
shows "¬ algebraic (Σ k. b k/a k)"
Assuming a key theorem whose proof has not been formalized could be seen as compromising the vision of absolute correctness, which demands a bottom-up approach: to formalize the proofs of all prerequisites. However, ours is a more realistic approach that reflects actual mathematical practice. It still guarantees the correctness of the arguments we formalized, which assume Roth’s theorem. Normally, whenever mathematicians make use of a theorem in their work, they do not prove it themselves all over again but they trust the proof in the literature. But when writing a proof with a proof assistant, one would ideally expect that every result used would also be formally verified as well. Here, however, Roth’s theorem is used even though its proof hasn’t been formalized; we just trust its informal proof [38], just like a traditional mathematician would do.
Our approach has the significant advantage of reaching the formalization of more advanced mathematics faster. Nevertheless, care must be taken to avoid propagating errors in the literature. We recommend that a formally unverified theorem should be taken as an assumption within a formalization only if it is a fundamental result that has been checked by many mathematicians in the past. Roth’s famous theorem can be considered safe in this sense. And all such assumptions must be declared openly [27, Section 5].
5.2.2 A small issue in the original proof
The process of the formalization helped to reveal a slight mistake in one of the original proofs. It puzzled us for some time, although the eventual fix was straightforward. In the proof of Theorem 5.1 [24, page 534] it is claimed that from the assumptions it follows that for each real number A > 1, there exists a positive integer k0 such that for all ,
During the formalization process, we noticed that there is a problem with this claim. Consider the counterexample where we set if k is odd, and
otherwise, with bk = 1 for all k. To resolve this, we had suggested a slightly modified version of the original statement and a different proof. However, this turned out to be unnecessary; the authors suggested to us via email that the problem could be resolved by replacing “for each real number A > 1” with “there exists a real number A > 1” above. This suggestion repaired the proof, and the rest of our formalization [Citation29] followed the original proof [24] without any further problems.
6 On the de Bruijn factors
The de Bruijn factor, introduced in 1977 by van Benthem Jutting [Citation26], is the ratio of the size of a formalization (in symbols) to the size of the original mathematical text (in words). Freek Wiedijk [Citation40] recommends making this more precise by comparing the number of bytes in the computer encodings (using the LATEX source of the mathematics), even compressing both files to ensure independence from such factors as the lengths of identifiers. Such precision is of questionable value, given the enormous variations in the density of mathematical texts.
Here, we only give approximate estimates of the de Bruijn factors for our formalizations, counting the number of lines in our formalizations and in the respective published informal proofs. We consider the entire amount of the material (i.e. statements together with their proofs as well as corollaries together with their proofs) in each formalization work. Given our formalization [Citation30] which spans around 1960 lines, we estimate the de Bruijn factor for Theorem 3.1 ([13, Theorem 2.1]), Corollary 3.1 ([13, Corollary 2.10]), and Theorem 3.2 ([13, Theorem 3.1]) and their respective proofs altogether to be around 25. Given our formalization [Citation28] which spans around 1054 lines, we estimate the de Bruijn factor for Theorem 4.1 ([23, Theorem 3]), Corollary 4.1 ([23, Corollary 2]) and their respective proofs altogether to be around 21. Finally, given our formalization [Citation29] which spans around 990 lines, we estimate the de Bruijn factor for Theorems 5.1 ([24, Theorem 2.1]) and 5.2 ([24, Theorem 2.2]) and their respective proofs altogether to be around 13. These de Bruijn factors can be considered high—Wiedijk reports lower factors for some formalizations in HOL [Citation41]—because we had to fill in a considerable amount of intermediate arguments which had not been made explicit in the original proofs, especially in the case of Erdős–Straus [Citation13].
7 Conclusion
We have formalized, in Isabelle/HOL [Citation28–30], several results on irrationality and transcendence criteria for infinite series from three research papers: by Erdős–Straus (1974) [Citation13], Hančl (2002) [Citation23] and Hančl–Rucki (2005) [Citation24].
We formalized results from mainstream journal papers in their original, unpolished form. Landmarks in the world of formalized mathematics include proving the four-colour theorem in Coq by Georges Gonthier [Citation15], the Kepler conjecture [Citation18] in HOL Light and Isabelle by Thomas C. Hales et al. [Citation19] and, more recently, the solution to the Cap Set Problem in Lean by Dahmen, Hölzl and Lewis [Citation7], the formalization of perfectoid spaces in Lean by Buzzard, Commelin and Massot [Citation6] and the formalization of Grothendieck’s Schemes in Isabelle/HOL by Bordg, Paulson and Li [Citation3, Citation4]. More in the spirit of our work is the formalization in Isabelle/HOL by Gouëzel of a recent paper by Shchur in the theory of Gromov-hyperbolic spaces, where a mistake in the original paper was moreover discovered through the formalization process [Citation16].
Summarizing the lessons that we learned through our formalization, we can now revisit the questions posed in the Introduction:
The amount of preliminary content in the current Isabelle/HOL libraries was indeed satisfactory. We had a priori chosen material that we expected would be within reach, and indeed, we didn’t run into significant deficiencies in the library. Some prerequisites that had to be added were a detailed formalization of infinite products, a modification of the ratio test for series and some facts about prime numbers. All these additions were doable within a reasonable amount of time and we can expect that they will be used in many other contexts. Our formalizations were thus an incentive to enrich the library.
In general we have found Sledgehammer’s automation to be of immense help, as hinted by the many occurrences throughout our formal proofs of proof methods such as metis and smt, which are generated by Sledgehammer.
As discussed in Section 6, the formal proofs were significantly longer than the original, which is reflected by the fairly high de Bruijn factors (approximately 25, 21, and 13 for the material [Citation13, Citation23, Citation24] respectively). This was certainly not unexpected.
We did not discover any major issues in the original proofs. We did however need to fill in some missing steps or calculations, some of which were proofs of summability of infinite series. These omissions contributed to the high de Bruijn factors. Regarding Erdős–Straus [Citation13], one of the proofs was slightly reformulated, while in other proofs a few inequalities required some minor corrections that did not, however, affect the correctness of the statements or the general proof structure. In Hančl–Rucki [Citation24], a slight mistake was easily fixed thanks to a hint from the authors. Our detailed, complete formal proofs may even be useful to the readers who wish to study the original proofs.
Overall, we found the formalization process to be interesting and educational. We hope that our feedback may be useful both to experienced proof assistant users and to mathematicians interested in attempting to formalize some research work for the first time.
Acknowledgements
The authors thank the ERC for their support through the ERC Advanced Grant ALEXANDRIA (Project GA 742178). We thank the anonymous reviewers for their useful suggestions that improved our exposition. We also thank Jaroslav Hančl and Pavel Rucki for a helpful email suggesting a fix for the slight mistake in their proof [24, Theorem 2.1], Iosif Pinelis for his useful explanation on MathOverflow regarding an argument in the proof of Theorem 2.2 in the same articleFootnote8 and Anthony Bordg for a discussion on the necessity of formalizing advanced mathematics even if the proofs of the prerequisites are not formalized.
7 Declaration of Interest
No potential conflict of interest was reported by the author(s).
Correction Statement
This article has been republished with minor changes. These changes do not impact the academic content of the article.
Notes
1 isabelle.in.tum.de/dist/library/HOL
3 The proof of Roth’s theorem itself has not been formalized; see Section 5.
5 These can also be found under “Documentation” on the Isabelle webpage.
6 as x approaches the filter F
7 Here, is the ‘little o’ notation, and
as
is equivalent to
.
8 The reader may notice that this development too seems to depend on the AFP entry for the Prime Number Theorem [10]. However we only use some technical lemmas from that development. Primes play no role here.
References
- Bak, J., and Newman, D. J. (2010). Complex Analysis, 3rd ed. Springer.
- Blanchette, J. C., Kaliszyk, C., Paulson, L. C., Urban, C. (2016). Hammering towards QED. J. Formaliz. Reason. 9(1): 101–148.
- Bordg, A., Paulson C., L. and Li, W., Grothendieck’s Schemes in Algebraic Geometry, Archive of Formal Proofs (2021), formal proof development, Available at: https://isa-afp.org/entries/Grothendieck_Schemes.html.
- Bordg, A., Paulson C., L. and Li, W. (2021). Simple Type Theory is not too Simple: Grothendieck’s Schemes without Dependent Types, arXiv:2104.09366.
- Boyer, R. et al., (1994). The QED Manifesto. In Bundy, A., ed. Automated Deduction – CADE 12, LNAI 814, Springer-Verlag; pp. 238–251.
- Buzzard, K., Commelin, J., Massot, P. Formalising perfectoid spaces, arXiv:1910.12320, in CPP 2020: Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, pp. 299–312.
- Dahmen, S. R., Hölzl, J., Lewis, R. Y. (2019). Formalizing the Solution to the Cap Set Problem, arXiv:1907.01449v1, Proceedings of Interactive Theorem Proving (ITP). https://drops.dagstuhl.de/opus/volltexte.
- Eberl, M. (2019). Verified real asymptotics in Isabelle/HOL. In Proceedings of International Symposium on Symbolic and Algebraic Computation. . doi:https://doi.org/10.1145/3326229.3326240
- Eberl, M. Elementary Facts About the Distribution of Primes, Archive of Formal Proofs (2019), formal proof development. Available at: https://www.isa-afp.org/entries/Prime_Distribution_Elementary.html.
- Eberl, M., Paulson, L.C. The Prime Number Theorem, Archive of Formal Proofs (2018), formal proof development. Available at: https://www.isa-afp.org/entries/Prime_Number_Theorem.html.
- Erdős, P. (1958). Sur certaines series à valeur irrationelle. Enseignment Math. 4: 93–100.
- Erdős, P. (1975). Some problems and results on the irrationality of the sum of infinite series. J. Math. Sci. 10: 1–7.
- Erdős, P., Straus, E. G. (1974). On the irrationality of certain series. Pacific J. Math. 55(1): 85–92. doi:https://doi.org/10.2140/pjm.1974.55.85
- Erdős, P., Straus, E. G. (1971). Some number theoretic results. Pacific J. Math. 36: 635–646. doi:https://doi.org/10.2140/pjm.1971.36.635
- Gonthier, G. (2008). Formal proof–the four-color theorem. Not. Am. Math. Soc. 55(11): 1382–1393.
- Gouëzel, S., Shchur, V. (2019). A corrected quantitative version of the Morse lemma. J. Funct. Anal. 277(4): 1258–1268.
- Gowers, W. T. (2010). Rough structure and classification. In: Alon N., Bourgain J., Connes A., Gromov M., Milman V., eds. Visions in Mathematics. Modern Birkhäuser Classics. Birkhäuser Basel. pp. 79–117.
- Hales, T. C. (2005). A proof of the Kepler conjecture. Ann. Math. 162 (3): 1065–1185. doi:https://doi.org/10.4007/annals.2005.162.1065
- Hales, T., Adams, M., Bauer, G., Dang, T. D., Harrison, J., Hoang, L. T., Kaliszyk, C., Magron, V., McLaughlin, S., Nguyen, T. T., Nguyen, Q. T., Nipkow, T., Obua, S., Pleso, J., Rute, J., Solovyev, A., Ta, T. H. A., Tran, N. T., Trieu, T. D., Urban, J., Vu, K., Zumkeller, R. (2017). A formal proof of the Kepler conjecture. Forum Math. Pi, 5, E2.
- Hančl, J. (1993). Criterion for irrational sequences. J. Number Theory. 43(1): 88–92.
- Hančl, J. (1996). Transcendental sequences. Math. Slov. 46, 177–179.
- Hančl, J. (2001). Two criteria for transcendental sequences, Le Matematiche, 56: 149–160.
- Hančl, J. (2002). Irrational rapidly convergent series. Rend. Sem. Mat. Univ. Padova 107, 225–231.
- Hančl, J., Rucki, P. (2005), The transcendence of certain infinite series. Rocky Mountain J. Math. 35(2): 531–537. doi:https://doi.org/10.1216/rmjm/1181069744
- Hölzl, J., Immler, F. and Huffman, B. Type classes and filters for mathematical analysis in Isabelle/HOL. In International Conference on Interactive Theorem Proving 2013 Jul 22. Berlin, Heidelberg: Springer; pp. 279–294.
- Jutting, L.S. van Benthem. Checking Landau’s “Grundlagen” in the AUTOMATH system. PhD thesis, Eindhoven University of Technology (1977). https://doi.org/10.6100/IR23183.
- Koutsoukou-Argyraki, A., Formalising Mathematics – in Praxis; A Mathematician’s First Experiences with Isabelle/HOL and the Why and How of Getting Started, Jahresber. Dtsch. Math. Ver. 123, pp. 3–26(2021). doi:https://doi.org/10.1365/s13291-020-00221-1.
- Koutsoukou-Argyraki, A., Li, W., Irrational rapidly convergent series, Archive of Formal Proofs (2018), formal proof development. Available at: https://www.isa-afp.org/entries/Irrationality_J_Hancl.html.
- Koutsoukou-Argyraki, A., Li, W., The transcendence of certain infinite series, Archive of Formal Proofs (2019), formal proof development, Available at: https://www.isa-afp.org/entries/Transcendence_Series_Hancl_Rucki.html.
- Koutsoukou-Argyraki, A., Li, W., Irrationality Criteria for Series by Erdős and Straus, Archive of Formal Proofs (2020), formal proof development. Available at: https://www.isa-afp.org/entries/Irrational_Series_Erdos_Straus.html.
- Mahler, K. (1976). Lectures on Transcendental Numbers, Lecture Notes in Mathematics, Vol. 546, Springer-Verlag Berlin-Heidelberg-New York.
- Nipkow, T., Paulson, L. C., Wenzel, M. (2002). Isabelle/HOL: A Proof Assistant for Higher-Order Logic, Lecture Notes in Computer Science, Vol. 2283, Springer-Verlag Berlin-Heidelberg.
- Nishioka, K. (1996). Mahler Functions and Transcendence, Lecture Notes in Mathematics 1631, Springer-Verlag Berlin-Heidelberg.
- Nyblom, M.A. (2000). A theorem on transcendence of infinite series. Rocky Mountain J. Math. 30: 1111–1120. doi:https://doi.org/10.1216/rmjm/1021477261
- Nyblom, M. A. (2001). A theorem on transcendence of infinite series II. J. Number Theory. 91: 71–80. doi:https://doi.org/10.1006/jnth.2001.2672
- Paulson, L. C. (1989). The foundation of a generic theorem prover, J. Autom. Reason. 5(3): 363–397, Kluwer Academic Publishers.
- Paulson, L. C., ALEXANDRIA: Large-scale formal proof for the working mathematician, project description, (12 Nov. 2018). Available at: http://www.cl.cam.ac.uk/lp15/Grants/Alexandria/.
- Roth, K. F. (1955). Rational approximations to algebraic numbers. Mathematica 2, Part 1(3): 1–20. doi:https://doi.org/10.1112/S0025579300000644
- Sándor, J. (1994). Some classes of irrational numbers. Studia Univ. Babeş-Bolyai Math., 29: 3–12.
- Wiedijk, F., The De Bruijn Factor, Technical report. Nijmegen, Netherlands: Department of Computer Science Nijmegen University. Available at: https://www.cs.ru.nl/freek/factor/factor.pdf.
- Wiedijk, F. The De Bruijn Factor (webpage). Available at: http://www.cs.ru.nl/freek/factor/.
Appendix
Our formalized version of Theorem 4.1 ([23, Theorem 3]) along with its formalized proof as in [Citation28] is presented below:
theorem Hancl3:
fixes d:: "nat⇒real " and a b::"nat⇒int" and A::real and s::nat
assumes "A > 1" and d:"∀n. d n> 1" and a:"∀n. a n > 0"
and b:"∀n. b n > 0" and "s > 0"
and assu1: "( (λ n. a n powr (1/(2::int)ˆn))→ A) sequentially"
and assu2: "∀n ≥ s. A/a n powr (1/(2::int)ˆn)> (πj. d (n + j))"
and assu3: "LIM n sequentially. d n ˆ 2 ˆ n/b n:> at_top"
and "convergent_prod d"
shows "(Σ n. b n/a n)/∉ Q"
proof(rule ccontr)
assume asm:"¬ ((Σn. b n/a n) /∉ Q)"
have ab_sum:"summable (λj. b j/a j)"
using issummable[OF hA>1i d a b assu1 assu3 hconvergent_prod di].
obtain p q::int where "q > 0"
and pq_def:"(λ n. b (n + 1)/a (n + 1)) sums (p/q)"
proof -
from asm have "(Σn. b n/a n) ∈ Q" by auto
then have "(Σn. b (n + 1)/a (n + 1)) ∈ Q "
by (subst suminf_minus_initial_segment[OF ab_sum,of 1]) auto
then obtain p’ q’::int where "q’_=0"
and pq_def:"(λ n. b (n + 1)/a (n + 1)) sums (p’/q’)"
unfolding Rats_eq_int_div_int
using summable_ignore_initial_segment[OF ab_sum,of 1,THEN summable_sums]
by force
define p q where "p=(if q’<0 then - p’ else p’)"
and "q=(if q’<0 then - q’ else q’)"
have "p’/q’=p/q" and "q > 0"
using hq’≠0i unfolding p_def q_def by auto
then show? thesis using that[of q p] pq_def by auto
qed
define ALPHA where
"ALPHA = (λn. q * (πj = 1.n. a j) * (Σj. b (j + n + 1)/a (j + n + 1)))"
have "ALPHA n ≥ 1" for n
proof -
have "(Σn. b (n + 1)/a (n + 1)) > 0"
apply (rule suminf_pos)
using summable_ignore_initial_segment[OF ab_sum,of 1] a b by auto
then have "p/q > 0" using sums_unique[OF pq_def,symmetric] by auto
then have "q ≥ 1" and "p ≥ 1" using hq > 0i by (auto simp add:divide_simps)
moreover have "∀n. 1 ≤ a n" and "∀n. 1 ≤ b n"
using a b by (auto simp add: int_one_le_iff_zero_less)
ultimately show? thesis unfolding ALPHA_def
using show7[OF _ _ _ _ pq_def] by auto
qed
moreover have "∃n. ALPHA n < 1" unfolding ALPHA_def
proof (rule lasttoshow[OF d a hs > 0i hq > 0i hA > 1i b _ assu3])
show "∀ F n in sequentially. (Σj. b (n + j)/a(n + j))
≤ 2 * b n/a n"
using show5[OF hA > 1 i d a b assu1 assu3 hconvergent_prod di] by simp
show "∀m ≥ s. ∃n ≥ m. d (n + 1)
* (MAX j∈{s.n}. a j powr (1/(2::int) ˆ j))
≤ a (n + 1) powr (1/(2::int) ˆ (n + 1))"
apply (rule showFootnote9[OF hA > 1i d a hs > 0i assu1 hconvergent_prod di])
using show8[OF hA > 1i d a hs > 0i hconvergent_prod di assu2]
by (simp add:algebra_simps)
qed
ultimately show False using not_le by blast
qed