Full article: No Eleventh Conditional Ingleton Inequality

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

A rational probability distribution on four binary random variables $X, Y, Z, U$ is constructed which satisfies the conditional independence relations $[X ⊥ ⊥ Y], [X ⊥ ⊥ Z | U], [Y ⊥ ⊥ U | Z] and [Z ⊥ ⊥ U | X Y]$ but whose entropy vector violates the Ingleton inequality. This settles a recent question of Studený (IEEE Trans. Inf. Theory vol. 67, no. 11) and shows that there are, up to symmetry, precisely ten inclusion-minimal sets of conditional independence assumptions on four discrete random variables which make the Ingleton inequality hold. The last case in the classification of which of these inequalities are essentially conditional is also settled.

Keywords:

2020 Mathematics Subject Classification:

1 Summary

This note answers Open Question 1 and one half of Open Question 2 raised by Milan Studený in his recent article [Citation19] on conditional Ingleton information inequalities on four discrete random variables $X, Y, Z, U$ . The first result is the following rational binary distribution represented by its atomic probabilities $p_{ijk l} = P (X = i, Y = j, Z = k, U = l)$ : $\begin{matrix} p_{0000} = \frac{20}{77}, p_{0001} = 0, p_{0010} = 0, p_{0011} = 0, \\ p_{0100} = \frac{20}{693}, p_{0101} = \frac{4}{99}, p_{0110} = \frac{10}{693}, p_{0111} = \frac{2}{99}, \\ p_{1000} = \frac{20}{693}, p_{1001} = \frac{40}{99}, p_{1010} = \frac{1}{693}, p_{1011} = \frac{2}{99}, \\ p_{1100} = 0, p_{1101} = 0, p_{1110} = 0, p_{1111} = \frac{2}{11}, \end{matrix}$ which satisfies solely the four conditional independence statements $[X ⊥ ⊥ Y], [X ⊥ ⊥ Z | U], [Y ⊥ ⊥ U | Z] and [Z ⊥ ⊥ U | X Y]$ and on which the Ingleton expression evaluates to a negative number close to $- 0.00757$ . This example settles simultaneously the last three open cases in the classification of CI-type conditional Ingleton inequalities on four discrete random variables and shows that all ten of them were already described in [Citation19].

With the knowledge of all conditional Ingleton inequalities, we also settle the last remaining case in the classification of their essential conditionality. The results are summarized in:

Theorem.

On four discrete random variables $X, Y, Z, U$ there are precisely ten inclusion-minimal conditional independence assumptions which make Ingleton’s inequality $□ (X Y | Z U) \geq 0$ hold for entropy vectors (up to the symmetries $X \leftrightarrow Y and Z \leftrightarrow U$ in the Ingleton expression), namely: (1.1) $[Z ⊥ ⊥ U]$ (1.1) (1.2) $[X ⊥ ⊥ Z]$ (1.2) (1.3) $[X ⊥ ⊥ Z | Y]$ (1.3) (1.4) $[X ⊥ ⊥ Y | Z U]$ (1.4) (1.5) $[X ⊥ ⊥ Z | Y U]$ (1.5) (2.1) $[X ⊥ ⊥ Y] \land [X ⊥ ⊥ Y | Z]$ (2.1) (2.2) $[X ⊥ ⊥ Y | Z] \land [Y ⊥ ⊥ U | Z]$ (2.2) (2.3) $[X ⊥ ⊥ Z | U] \land [X ⊥ ⊥ U | Z]$ (2.3) (2.4) $[X ⊥ ⊥ Z | U] \land [Z ⊥ ⊥ U | X]$ (2.4) (2.5) $[X ⊥ ⊥ Z | U] \land [Y ⊥ ⊥ Z | U]$ (2.5)

The conditional Ingleton inequalities given by (1.1)–(1.5) are not essentially conditional but the ones given by (2.1)–(2.5) are essentially conditional.

These results are derived computationally. Section 2 gives an introduction to the topic of conditional Ingleton inequalities and recalls the previous results leading to the question answered here. For basics on polymatroids and their role in conditional independence and information theory we refer to the excellent exposition in [Citation19]. The computational methodologies used to find the above distribution and to prove essential conditionality of inequality (2.5) are explained in Sections 3 and 4, respectively. Section 5 collects further remarks and observations. The source code in Macaulay2 [25] and Mathematica [Citation26] behind various steps in the computations and auxiliary data produced using 4ti2 [22] and normaliz [Citation24] are available at https://mathrepo.mis.mpg.de/ConditionalIngleton/.

2 On conditional Ingleton inequalities

2.1 Ingleton inequality and entropy region

Suppose that $X, Y, Z, U$ are subspaces in a finite-dimensional (left or right) vector space over a field (or division ring). For this data, the Ingleton inequality asserts that (□) $\begin{matrix} 0 & \leq \dim 〈 X, Z 〉 + \dim 〈 X, U 〉 + \dim 〈 Y, Z 〉 + \dim 〈 Y, U 〉 + \dim 〈 Z, U 〉 - \\ \dim 〈 X, Y 〉 - \dim 〈 Z 〉 - \dim 〈 U 〉 - \dim 〈 X, Z, U 〉 - \dim 〈 Y, Z, U 〉, \end{matrix}$ (□) where $\dim 〈 - 〉$ is the dimension of the subspace spanned by its arguments. This rank function $h : 2^{N} \to R$ , $I \mapsto \dim 〈 I 〉$ on subsets of $N = {X, Y, Z, U}$ is a polymatroid, i.e., it is normalized: $h (\emptyset) = 0$ , nondecreasing: $h (I) \leq h (J)$ for $I \subseteq J$ , and submodular: $h (I) + h (J) \geq h (I \cup J) + h (I \cap J)$ for all $I, J \subseteq N$ . The set of polymatroids over an $n$ -element set forms a rational polyhedral cone in $R^{2^{n}}$ denoted by $H_{n}$ . The Ingleton expression $□ (X Y | Z U)$ is the linear functional in $h$ which appears on the right-hand side of (□). Hence, non-negativity of the inner product $□ (X Y | Z U) \cdot h$ is a necessary condition for a polymatroid $h$ to be linearly representable over some division ring. The necessity was found by Ingleton [Citation6] through an analysis of the Vámos matroid, the prototypical example of a non-linear matroid.

Now let $X, Y, Z, U$ denote jointly distributed random variables which take only finitely many states. These random variables are referred to as discrete with finiteness being implicit. If $X$ attains $q$ states, without loss of generality from the set $[q] : = {1, \dots, q}$ , with positive probabilities $p (X = i)$ , then its Shannon entropy is the expression $H (X) : = E_{X} [\log \frac{1}{p}] = \sum_{i = 1}^{q} p (X = i) \log \frac{1}{p (X = i)} .$

The entropy vector of jointly distributed discrete random variables $X_{1}, \dots, X_{n}$ assigns to each subset $I \subseteq [n]$ the entropy of the vector-valued discrete random variable $X_{I} : = (X_{i} : i \in I)$ . We denote the entropy region, the set of all points in $R^{2^{n}}$ which occur as entropy vectors of $n$ discrete random variables, by $H_{n}^{*}$ . The choice of basis for the logarithm changes the scale of all entropy vectors and does not change any of the considerations in this paper.

Fujishige [Citation4] observed that the non-negativity of Shannon’s information measures implies that entropy vectors are polymatroids and thus entropy vectors are sometimes called entropic polymatroids. A result of Matúš [Citation9, Lemma 10] implies that every integer polymatroid which is linearly representable by a subspace arrangement over a field is a scalar multiple of an entropic one. Hence, it makes sense to reinterpret Ingleton’s functional $□ (X Y | Z U)$ by replacing $\dim 〈 - 〉 with H (-)$ . But whereas the inequality $□ \geq 0$ is valid for linear polymatroids, it fails for the more general entropic ones. This paper is concerned with special types of assumptions on entropy vectors which guarantee that the Ingleton inequality holds.

2.2 Discrete representability of CI structures

Even though the Ingleton inequality does not hold universally for entropy vectors, it was a key tool in the characterization of conditional independence (CI) structures which are representable by four discrete random variables. This classification was achieved in the series of papers [Citation8, Citation10, Citation15] by Matúš and Studený and we use this section to outline the role of the Ingleton inequality in this work.

Let $I, J, K \subseteq [n]$ . The common shorthand notation $I J : = I \cup J$ applies to these subsets. For a polymatroid $h and I, J, K \subseteq [n]$ , we employ the difference expression $Δ (I, J | K) \cdot h : = h (I K) + h (J K) - h (IJK) - h (K),$ that is, $Δ (I, J | K)$ is a linear functional on $R^{2^{n}}$ . The non-negativity of this functional on $H_{n}$ is guaranteed by the submodular inequalities. Its vanishing makes $I K$ and $J K$ a modular pair. If $h$ is the entropy vector of random variables $(X_{i} : i \in [n])$ , then $Δ (I, J | K)$ is known as the conditional mutual information of subvectors $X_{I}$ and $X_{J} given X_{K}$ and its vanishing is equivalent to the conditional independence $[X_{I} ⊥ ⊥ X_{J} | X_{K}]$ . Recall from [Citation19, Section II.D] that the study of conditional independence (excluding functional dependence) can be reduced to the elementary CI statements, i.e., the equalities $Δ (i, j | K) = 0 where i and j$ are distinct singletons and $K$ is a subset of $N$ not containing $i$ or $j$ . These functionals define facets of $H_{n}$ and even supporting hyperplanes of $H_{n}^{*}$ with non-empty intersection. A set $L$ of elementary CI statements on $n$ random variables, also called a CI structure, is representable if and only if there exists $h \in H_{n}^{*}$ such that $Δ (i, j | K) \cdot h = 0 \Leftrightarrow [i ⊥ ⊥ j | K] \in L$ . The CI structure defined by any polymatroid $h$ in this way is denoted by $[[h]]$ .

Let $H_{4}^{□}$ denote the subcone of $H_{4}$ (whose ground set elements are labeled $X, Y, Z, U$ ) which consists of polymatroids satisfying the Ingleton inequalities $Δ (I J | K L) \geq 0$ for every possible permutation $I, J, K, L of X, Y, Z, U$ . There are $(\begin{matrix} 4 \\ 2 \end{matrix}) = 6$ unique such inequalities because the Ingleton expression is invariant under exchanging $I \leftrightarrow J$ and $K \leftrightarrow L$ . One key insight of [Citation15] is that the extreme rays of $H_{4}^{□}$ are a subset of those of $H_{4}$ and that they are all probabilistically representable. This implies that every CI structure $[[h]]$ , for $h \in H_{4}^{□}$ , is representable; this condition is of polyhedral nature and can easily be checked using linear programming. Miraculously, even in the non-Ingleton regime, the Ingleton inequality is the main obstruction to entropicness: namely, in [Citation10] sets of CI statements $L$ are described such that whenever a polymatroid $h$ is entropic and satisfies $L \subseteq [[h]]$ , then $□ (X Y | Z U) \cdot h \geq 0$ holds. This is a conditional information inequality in the sense of [Citation7], formally written as $L \Rightarrow □ (X Y | Z U) \geq 0$ and called a conditional Ingleton inequality. It is important to emphasize that a conditional information inequality is not required to hold for general polymatroids (in which case it would be a consequence of the polyhedral geometry of $H_{4}$ and not very informative) but only for entropic polymatroids. An inequality such as $L \Rightarrow □ (X Y | Z U) \geq 0$ allows one to conclude that a CI structure containing $L$ cannot be representable if the cone of its realizing polymatroids does not intersect the cone given by $□ (X Y | Z U) \geq 0$ ; which is again a polyhedral condition that can be computed easily.

Convention. The definition of conditional information inequality in [Citation7] allows arbitrary linear assumptions $p_{1} \cdot h \geq 0 \land \dots \land p_{s} \cdot h \geq 0$ to imply a linear conclusion $q \cdot h \geq 0$ . Conditional independence assumptions are a special case of this using $△$ functionals. In this work, “conditional information inequality” will always refer to the special case of CI-type inequality.

While the precise shape of $H_{4}^{*}$ or even its closure $\bar{H_{4}^{*}}$ in the euclidean topology (which is known to be a convex cone [Citation21]) remains unknown to date (cf. [Citation12] and [Citation5] for a challenging open problem), conditional information inequalities help to delimit it in ways that go beyond linear inequalities and hence make it possible to describe differences between the entropy region and its closure. This becomes significant, for example, when information-theoretic optimization problems such as channel capacity computations are solved not in terms of their original parameters and non-linear objective functions but in terms of linear programs over the entropy region; this is done in Shannon’s original paper [Citation17, Theorem 10] and has since then become a standard technique. In this case, the optimum is attained on the boundary of $\bar{H_{n}^{*}}$ . Even if it can be located, it is not clear whether the optimizer is entropic and hence corresponds to a real probability distribution or if it can only be approximated arbitrarily well by distributions.

The knowledge of which CI structures are representable can be viewed as combinatorial information about the intricate boundary structure of $H_{4}^{*}$ . Namely, given a set of CI assumptions $L$ which define a subspace $U = {h \in R^{16} : Δ (i, j | K) \cdot h = 0 for all$ $[i ⊥ ⊥ j | K] \in L}$ , the question is which other inequalities $△ \geq 0$ are tight at every point in $H_{n}^{*} \cap U$ ? Calling the set of implied statements $M$ , this proves a conditional independence inference rule $L \Rightarrow M$ for representable CI structures. Unlike the geometric shape of $H_{4}^{*}$ , this combinatorial, CI-theoretic information about its boundary is completely available due to the series of papers by Matúš and Studený. Studený’s recent paper [Citation19] revisits this series and shows that all inference properties for four discrete random variables can be deduced from conditional Ingleton inequalities in addition to the common Shannon information inequalities. Each of the 10 conditional Ingleton inequalities presented in [Citation19] is necessary to obtain all the CI inference rules. In this paper we prove that there are no further, in the CI-theoretic sense “extraneous,” conditional Ingleton inequalities.

2.3 Masks and conditional Ingleton inequalities

One way to obtain conditional Ingleton inequalities is to rewrite the functional $□ (X Y | Z U)$ as a linear combination of difference expressions $Δ (i, j | K)$ in the dual space ${(R^{16})}^{*}$ . Some of these masks of the Ingleton expression were found in [Citation15] and are also discussed in [Citation19, Section II.G]: (M.1) $□ (X Y | Z U) = Δ (Z, U | X) + Δ (Z, U | Y) + Δ (X, Y) - Δ (Z, U)$ (M.1) (M.2) $= Δ (Z, U | Y) + Δ (X, Z | U) + Δ (X, Y) - Δ (X, Z)$ (M.2) (M.3) $= Δ (X, Y | Z) + Δ (X, Z | U) + Δ (Z, U | Y) - Δ (X, Z | Y)$ (M.3) (M.4) $= Δ (X, Y | Z) + Δ (X, Y | U) + Δ (Z, U | X Y) - Δ (X, Y | Z U)$ (M.4) (M.5) $= Δ (X, Y | Z) + Δ (X, Z | U) + Δ (Z, U | X Y) - Δ (X, Z | Y U) .$ (M.5)

These masks prove (1.1)–(1.5); indeed mask (M.1), for example, implies (1.1): $[Z ⊥ ⊥ U] \Rightarrow □ (X Y | Z U) \geq 0$ due to the non-negativity of all difference expressions. Under the symmetries $X \leftrightarrow Y and Z \leftrightarrow U$ which fix $□ (X Y | Z U)$ , these five masks generate 14 distinct conditional Ingleton inequalities, displayed below in groups by symmetry class: $\begin{matrix} ([Z ⊥ ⊥ U]), ([X ⊥ ⊥ Z], [Y ⊥ ⊥ Z], [X ⊥ ⊥ U], [Y ⊥ ⊥ U]), \\ ([X ⊥ ⊥ Z | Y], [Y ⊥ ⊥ Z | X], [X ⊥ ⊥ U | Y], [Y ⊥ ⊥ U | X]), ([X ⊥ ⊥ Y | Z U]), \\ ([X ⊥ ⊥ Z | Y U], [Y ⊥ ⊥ Z | X U], [X ⊥ ⊥ U | Y Z], [Y ⊥ ⊥ U | X Z]) . \end{matrix}$

In [Citation19, Section IV] five further conditional Ingleton inequalities are proved which require two CI assumptions. They expand to fourteen conditional inequalities under symmetry as well. Studený has ruled out five other sets of CI assumptions by counterexamples and reduced the possibilities for an eleventh conditional Ingleton inequality to three CI structures, namely the sets strictly above $L_{0} = [X ⊥ ⊥ Z | U] \land [Y ⊥ ⊥ U | Z]$ and below $L = [X ⊥ ⊥ Z | U] \land [Y ⊥ ⊥ U | Z] \land [X ⊥ ⊥ Y] \land [Z ⊥ ⊥ U | X Y]$ .

The verification of this claim by hand is tedious. The process can be delegated to a SAT solver such as CaDiCaL [Citation23] as follows. There are 24 elementary CI statements $[i ⊥ ⊥ j | K]$ on four random variables; introduce one boolean variable for each of them. If a CI structure implies the Ingleton inequality, then so does every superset. If a counterexample exists for a set of CI assumptions, then every subset is ruled out by the same counterexample. Using the 10 known conditional Ingleton inequalities, Studený’s five counterexamples and the conjectured minimal and maximal unsolved cases $L_{0} and L$ —and all their symmetric variants—, a boolean formula can be constructed whose satisfying assignments are all CI structures which are not covered and are potential assumptions for an eleventh conditional Ingleton inequality. The solver quickly decides that the formula is unsatisfiable and hence proves that all unsolved cases are between $L_{0} and L$ . More details and source code for this computation are available on our MathRepo page.

The objective of the next section is to construct a probability distribution satisfying $L$ and violating the Ingleton inequality. Known examples of this kind are usually hand-crafted, rational distributions with small denominators derived by careful exploitation of zero patterns and symmetries; cf. [Citation7, Citation19]. We present a different, computer-assisted and heuristic methodology to find counterexamples in information theory rooted in algebra and relying on symbolic computations as well as numerical non-linear optimization.

3 Construction of the distribution

3.1 Circuits, masks and scores

The $24$ difference expressions $Δ (i, j | K)$ and the Ingleton expression $□ (X Y | Z U)$ are elements in the dual space ${(R^{16})}^{*}$ . Choosing the standard basis there, they can be identified with vectors which make up the columns of a $16 \times 25$ matrix. The circuits of this matrix, i.e., the nonzero integer vectors in its kernel with inclusion-minimal support and coprime nonzero entries, can be computed using the software 4ti2 [Citation22]; cf. [Citation18, Chapter 4]. There are $10 481$ such circuits and among them $6 814$ which give a nonzero coefficient to $□ (X Y | Z U)$ . These circuits are the shortest possible ways of writing $□$ as a linear combination of $△$ . The 14 shortest circuits require only four $△$ terms one of which with a negative coefficient; they are precisely the 14 symmetric images of (M.1)–(M.5). All masks are available on our website.

Based on the circuits, we obtain short masks which are closely related to the two subcases $L_{1} = [Z ⊥ ⊥ U | X Y] \land L_{0} and L_{2} = [X ⊥ ⊥ Y] \land L_{0}$ of the model $L = L_{1} \land L_{2}$ . All three cases remained open in Studený’s analysis, but $L_{0}$ was settled in [Citation19, Example 5]. The mask (†1) $\begin{matrix} □ (X Y | Z U) & = Δ (X, Y | Z U) + Δ (X, Z | U) - Δ (X, Z | Y U) + Δ (Y, U | Z) - \\ Δ (Y, U | X Z) + Δ (Z, U | X Y), \end{matrix}$ (†1) can be confirmed by plugging in the definitions of $△$ and $□$ . It was selected to simplify as much as possible under the CI assumptions $L_{1}$ which would otherwise contribute positive quantities to the Ingleton expression. Given that $L_{1}$ holds, the mask (†₁) yields (‡1) $\begin{matrix} - □ (X Y | Z U) & = Δ (X, Z | Y U) + Δ (Y, U | X Z) - Δ (X, Y | Z U) \\ = H (Y | X Z) + H (X | Y U) - H (X Y | Z U) = : ϱ_{1} (X, Y, Z, U) . \end{matrix}$ (‡1)

Analogously one proves (†2) $□ (X Y | Z U) = Δ (X, Y) - Δ (X, Z) + Δ (X, Z | U) - Δ (Y, U) + Δ (Y, U | Z) + Δ (Z, U),$ (†2)

which under $L_{2}$ yields (‡2) $\begin{matrix} - □ (X Y | Z U) & = Δ (X, Z) + Δ (Y, U) - Δ (Z, U) \\ = H (Z U) - H (Z | X) - H (U | Y) = : ϱ_{2} (X, Y, Z, U) . \end{matrix}$ (‡2)

The functions $ϱ_{1} and ϱ_{2}$ are referred to as the non-Ingleton scores on $L_{1} and L_{2}$ , respectively. On the distributions satisfying the respective CI statements, they equal the value of $- □ (X Y | Z U)$ but they involve fewer terms and are thus easier to evaluate and to differentiate. Both scores coincide on the intersection $L$ of the models $L_{1} and L_{2}$ .

We continue with a geometric analysis of the space of binary distributions in the model $L_{1}$ and extend these findings to derive a binary distribution for $L$ with positive non-Ingleton score.

3.2 Parametrization of $L_{1}$

A joint distribution of four binary random variables is given by a $2 \times 2 \times 2 \times 2$ tensor with real, nonnegative entries $p_{ijk l}$ which sum to one. With all four indices ranging in ${0, 1}$ , these represent the atomic probabilities of the sixteen joint events. The CI statements of $L_{1}$ prescribe quadratic equations on these probabilities: $\begin{matrix} [Z ⊥ ⊥ U | X Y] \Leftrightarrow {\begin{matrix} p_{0000} \cdot p_{0011} = p_{0001} \cdot p_{0010}, \\ p_{0100} \cdot p_{0111} = p_{0101} \cdot p_{0110}, \\ p_{1000} \cdot p_{1011} = p_{1001} \cdot p_{1010}, \\ p_{1100} \cdot p_{1111} = p_{1101} \cdot p_{1110}, \end{matrix} \\ [X ⊥ ⊥ Z | U] \Leftrightarrow {\begin{matrix} (p_{0000} + p_{0100}) \cdot (p_{1010} + p_{1110}) = (p_{0010} + p_{0110}) \cdot (p_{1000} + p_{1100}), \\ (p_{0001} + p_{0101}) \cdot (p_{1011} + p_{1111}) = (p_{0011} + p_{0111}) \cdot (p_{1001} + p_{1101}), \end{matrix} \\ [Y ⊥ ⊥ U | Z] \Leftrightarrow {\begin{matrix} (p_{0000} + p_{1000}) \cdot (p_{0101} + p_{1101}) = (p_{0001} + p_{1001}) \cdot (p_{0100} + p_{1100}), \\ (p_{0010} + p_{1010}) \cdot (p_{0111} + p_{1111}) = (p_{0011} + p_{1011}) \cdot (p_{0110} + p_{1110}) . \end{matrix} \end{matrix}$

These equations are studied in algebraic statistics; see [Citation20, Proposition 4.1.6] for their derivation. It is in general difficult to derive a rational parameterization of a given CI model. To simplify this task, we impose the support pattern which already appears in [Citation19, Example 5]: suppose that $p_{0001} = p_{0010} = p_{0011} = p_{1100} = p_{1101} = p_{1110} = 0$ and all other variables are positive. From now on, we regard only this linear slice of the CI models for $L_{1}, L_{2}$ , and $L$ .

Under these additional constraints, the above eight equations together with the condition that all probabilities sum to one can be resolved to yield the rational parameterization (*) $\begin{matrix} p_{0100} = \frac{p_{0101} p_{0110}}{p_{0111}}, p_{1000} = \frac{p_{1001} p_{1010}}{p_{1011}}, \\ p_{0111} = \frac{p_{0101}}{p_{1001} (p_{1011} + p_{1111})}, p_{0000} = \frac{p_{0110} p_{1001} p_{1111}}{p_{1011} (p_{1011} + p_{1111})}, \\ p_{1010} = \frac{p_{0110} p_{1011}}{p_{1011} + p_{1111}}, p_{0101} = \frac{p_{1001} p_{1011}}{p_{1011} + p_{1111}}, \\ p_{1001} = \frac{p_{1011}^{2} (1 - 2 p_{0110} - 2 p_{1011}) + p_{1011} p_{1111} (1 - p_{0110} - 3 p_{1011} - p_{1111})}{(p_{0110} + p_{1011}) (2 p_{1011} + p_{1111})} . \end{matrix}$ (*)

With six zero conditions and seven equations (two of the CI equations trivialize under the zero constraints), this leaves the three parameters $p_{0110}, p_{1011}$ , and $p_{1111}$ . The positivity conditions on the ten nonzero probabilities turn into non-linear inequalities and these are the only remaining constraints on the parameters. Thus, this defines a three-dimensional basic semialgebraic set $T_{1}$ .

3.3 Numerical optimization and a rational point

The Ingleton inequality is not an algebraic function of the parameters but a transcendental one. Hence, algebraic techniques like Gröbner bases or cylindrical algebraic decomposition cannot be directly applied to decide if there exist parameters on which $□ (X Y | Z U)$ is negative. This question can be reformulated as whether a system of integer polynomial equations and inequalities in variables and exponentials of variables has a real solution. Thus, it is a question in the existential theory of the real numbers with exponentiation. The decidability of this theory is an open problem known as Tarski’s Exponential Function Problem and hence no general symbolic algorithms are available today to solve it; see [Citation16] for a starting point on this topic.

Instead of symbolic techniques, we employ optimization. Mathematica’s FindMaximum function, when started on the values $(\frac{1}{16}, \frac{1}{16}, \frac{1}{16})$ , numerically finds a local maximum of $ϱ_{1}$ on $T_{1}$ with value $0.0198$ at the parameters $p_{0110} = 0.36179$ , $p_{1011} = 0.01463 and p_{1111} = 0.27455$ . By continuity, $ϱ_{1}$ remains positive in a small neighborhood of this point. Searching for a local minimum of $ϱ_{1}$ in the range (▭) $\frac{1}{6} \leq p_{0110} \leq \frac{3}{6}, \frac{1}{160} \leq p_{1011} \leq \frac{3}{160}, \frac{1}{8} \leq p_{1111} \leq \frac{3}{8}$ (▭) yields a positive value, indicating that this region is likely to contain many points violating the Ingleton inequality. Based on this heuristic, we want to find a distribution in this range which satisfies the system $T$ consisting of the inequalities of $T_{1}$ and the additional CI equation for $[X ⊥ ⊥ Y]$ which rewrites under the parameterization (*) to $\begin{matrix} p_{1011}^{2} {(p_{1011} + p_{1111})}^{3} + p_{0110}^{2} p_{1111} (2 p_{1011}^{3} + p_{1111}^{4} + p_{1011} p_{1111}^{2} (1 + 4 p_{1111}) + p_{1011}^{2} p_{1111} (3 + 4 p_{1111})) + \\ p_{0110} (p_{1011}^{4} + 5 p_{1011} p_{1111}^{5} + p_{1111}^{6} + 2 p_{1011}^{3} (p_{1111} + 2 p_{1111}^{3}) + p_{1011}^{2} (p_{1111}^{2} + 8 p_{1111}^{4})) \\ = p_{1111} (2 p_{1011}^{2} + 3 p_{1011} p_{1111} + p_{1111}^{2}) (p_{1011}^{3} + p_{1011}^{2} p_{1111} + p_{0110} p_{1111}^{2}) . \end{matrix}$

This equation can be resolved for $p_{0110} = f (p_{1011}, p_{1111})$ where $f$ is a (lengthy) algebraic function involving rational functions of its arguments and a single square root. The system $T$ together with the bounds (▭) define a semialgebraic set and Mathematica’s FindInstance function quickly returns a solution typically with large denominators and an algebraic number of extension degree 2 over $Q$ . This distribution proves $L \Rightarrow □ (X Y | Z U) \geq 0$ . A rough map of where such counterexamples lie in the space $T$ is given in .

Fig. 1 The model $L$ in its $(p_{1111}, p_{1011})$ -parameter space $T$ . Points with a positive non-Ingleton score $ϱ_{2}$ are colored in red. The rational non-Ingleton distribution with $p_{1011} = \frac{2}{99}$ and $p_{1111} = \frac{2}{11}$ is marked with a black dot.

However, to confirm the Ingleton violation without numerical approximations, we seek a distribution with rational probabilities. The distribution is rational if $p_{0110}, p_{1011}, p_{1111}$ can be chosen rational, which hinges on the square root in the algebraic function $f determining p_{0110}$ . The term under the square root, expressed in $p_{1011} = \frac{a}{b} and p_{1111} = \frac{c}{d} with a, b, c, d \in N$ , reads $\begin{matrix} \frac{1}{b^{8} d^{12}} (b^{8} c^{12} + 10 a b^{7} c^{11} d - 2 b^{8} c^{11} d + 41 a^{2} b^{6} c^{10} d^{2} - 16 a b^{7} c^{10} d^{2} + b^{8} c^{10} d^{2} + 88 a^{3} b^{5} c^{9} d^{3} - 46 a^{2} b^{6} c^{9} d^{3} + \\ 6 a b^{7} c^{9} d^{3} + 104 a^{4} b^{4} c^{8} d^{4} - 44 a^{3} b^{5} c^{8} d^{4} + 11 a^{2} b^{6} c^{8} d^{4} + 64 a^{5} b^{3} c^{7} d^{5} + 44 a^{4} b^{4} c^{7} d^{5} + 2 a^{3} b^{5} c^{7} d^{5} - \\ 2 a^{2} b^{6} c^{7} d^{5} + 16 a^{6} b^{2} c^{6} d^{6} + 136 a^{5} b^{3} c^{6} d^{6} - 6 a^{4} b^{4} c^{6} d^{6} - 14 a^{3} b^{5} c^{6} d^{6} + 112 a^{6} b^{2} c^{5} d^{7} + 26 a^{5} b^{3} c^{5} d^{7} - \\ 42 a^{4} b^{4} c^{5} d^{7} + 32 a^{7} b c^{4} d^{8} + 68 a^{6} b^{2} c^{4} d^{8} - 70 a^{5} b^{3} c^{4} d^{8} + a^{4} b^{4} c^{4} d^{8} + 56 a^{7} b c^{3} d^{9} - 68 a^{6} b^{2} c^{3} d^{9} + \\ 4 a^{5} b^{3} c^{3} d^{9} + 16 a^{8} c^{2} d^{10} - 36 a^{7} b c^{2} d^{10} + 6 a^{6} b^{2} c^{2} d^{10} - 8 a^{8} c d^{11} + 4 a^{7} b c d^{11} + a^{8} d^{12}) . \end{matrix}$

The denominator is always a square, so it suffices to find, in accordance with (▭), four positive integers $b \leq 160 a \leq 3 b and d \leq 8 c \leq 3 d$ which make the parenthesized numerator into a square. An exhaustive search through small denominators $b, d$ turns up $p_{1011} = \frac{2}{99} and p_{1111} = \frac{2}{11}$ satisfying this criterion, because their value $937 129 691 803 487 846 400 = 30 612 574 080^{2}$ is a perfect square. The resulting rational value $p_{0110} = f (\frac{2}{99}, \frac{2}{11}) = \frac{10}{693}$ does not satisfy (▭) but it still yields a positive non-Ingleton score. To see this, consider the score $ϱ_{2}$ of the distribution with the given parameters, write all fractions with their common denominator $693$ and assemble all terms under one $\log \sqrt[693]{-}$ . Then from ${(\exp ϱ_{2})}^{693} = \frac{24^{24} \cdot 30^{30} \cdot 141^{141} \cdot 168^{168} \cdot 201^{201} \cdot 228^{228} \cdot 294^{294} \cdot 300^{300} \cdot 693^{693}}{11^{11} \cdot 154^{154} \cdot 198^{198} \cdot 220^{220} \cdot 252^{252} \cdot 308^{308} \cdot 441^{441} \cdot 495^{495}}$ the violation of the Ingleton inequality is just a matter of comparing the integers in the numerator and denominator—a standard task which every computer algebra system with exact arithmetic on big integers will perform. The former is approximately $219.148 \cdot 10^{5190}$ and the latter $1.14751 \cdot 10^{5190}$ . Thus, the fraction is greater than one and the non-Ingleton score is positive. Numerically, the score and hence the negative of the Ingleton expression $□ (X Y | Z U)$ is approximately $0.00757$ . The distribution in its entirety is given in the beginning of this note.

4 Classification of essentially conditional Ingleton inequalities

4.1 Essential conditionality

The second part of our theorem concerns essential conditionality, a notion introduced in [Citation7]. Given a conditional information inequality $L \Rightarrow □ (X Y | Z U) \geq 0$ one may ask if it arises from a valid unconditional information inequality of the form (□λ) $□ (X Y | Z U) + \sum_{[i ⊥ ⊥ j | K] \in L} λ_{[i ⊥ ⊥ j | K]} Δ (i, j | K) \geq 0,$ (□λ) with Lagrange multipliers $λ_{[i ⊥ ⊥ j | K]} \geq 0$ . The existence of multipliers which make ( $□_{λ}$ ) a valid information inequality constitutes an “unconditional” proof of the conditional inequality $L \Rightarrow □ (X Y | Z U) \geq 0$ ; otherwise this inequality is essentially conditional. The masks (M.1)–(M.5) show that the conditional Ingleton inequalities (1.1)–(1.5) are in fact not essentially conditional. Among the first examples of essentially conditional inequalities due to Kaced and Romashchenko [Citation7] are the conditional Ingleton inequalities (2.1)–(2.4). Hence, the only remaining case in the classification of essential conditionality for conditional Ingleton inequalities is the inequality (2.5) which was recently discovered by Studený [Citation19].

Remark 4.1.

All unconditional information inequalities are valid for almost-entropic polymatroids, i.e., points of the closure $\bar{H_{n}^{*}}$ . This is not clear for essentially conditional inequalities and [Citation7, Section V] proves that (2.1) does not hold almost-entropically but (2.3) and (2.4) do.

4.2 Sampling for a counterexample

If $λ$ is a tuple of Lagrange multipliers that makes ( $□_{λ}$ ) true and $μ \geq λ componentwise, then μ$ also makes ( $□_{λ}$ ) true since the $△$ functionals are nonnegative on the entropy region. Hence there is no loss of generality in assuming that all multipliers are equal and arbitrarily large but fixed. To prove essential conditionality we construct counterexamples to ( $□_{λ}$ ) depending continuously on $λ \to \infty$ , i.e., a curve of counterexamples. The curves proving essential conditionalities in [Citation7] all follow a simple combinatorial recipe:

Commit to state space sizes for all four random variables; usually they are all assumed to be binary. This gives rise to 16 real parameters $P = {p_{0000}, \dots, p_{1111}}$ .
Choose a partition of $P$ into four subsets $A, B, C, D$ and assign the probabilities
Table
Display Table
with a real, positive parameter $ε \to 0$ . To ensure that the result is a probability distribution we require $| A \cup B | > 0$ and $| C | > 0$ .

A curve of this type converges to a distribution which is uniform on its support. It is well-known [Citation3] that every invalid information inequality can be refuted by such a distribution—however, this result requires unbounded state spaces. The typical argument in [Citation7] expands the terms in ( $□_{λ}$ ) as power series in $ε$ around zero and compares convergence orders to conclude that a small enough value of $ε$ leads to a violation of the inequality.

Sampling distributions according to the above algorithm and using criteria based on the limit behavior of the power series coefficients obtained via Mathematica’s Series function eventually turns up the following sparse proof of essential conditionality for (2.5): $\begin{matrix} p_{0000} = 0, p_{0001} = 0, p_{0010} = \frac{1}{5} - ε, p_{0011} = 0, \\ p_{0100} = 0, p_{0101} = 0, p_{0110} = \frac{1}{5}, p_{0111} = 0, \\ p_{1000} = 0, p_{1001} = 0, p_{1010} = \frac{1}{5}, p_{1011} = 0, \\ p_{1100} = ε, p_{1101} = \frac{1}{5}, p_{1110} = 0, p_{1111} = \frac{1}{5} . \end{matrix}$

The CI assumptions of (2.5) are only satisfied in the limit $ε = 0$ since $\begin{matrix} Δ (X, Z | U) = Δ (Y, Z | U) = \frac{1}{5} \log (\frac{27}{{(3 - 5 ε)}^{3 - 5 ε} \cdot {(1 + 5 ε)}^{1 + 5 ε}}) \\ = \log (3) ε - \frac{10}{3} ε^{2} + \frac{100}{27} ε^{3} + O (ε^{4}) . \end{matrix}$

This makes it possible to violate the Ingleton inequality, and indeed: $\begin{matrix} □ (X Y | Z U) = \log (\sqrt[5]{\frac{27}{8000}} \cdot \frac{{(\frac{1}{5} - ε)}^{\frac{1}{5} - ε} \cdot {(\frac{4}{5} - ε)}^{\frac{4}{5} - ε} \cdot {(\frac{2}{5} + ε)}^{\frac{2}{5} + ε} \cdot ε^{ε}}{{(\frac{2}{5} - ε)}^{2 (\frac{2}{5} - ε)} \cdot {(\frac{3}{5} - ε)}^{\frac{3}{5} - ε} \cdot {(\frac{1}{5} + ε)}^{3 (\frac{1}{5} + ε)}}) \\ = (\log (30 ε) - 1) ε - \frac{155}{25} ε^{2} + \frac{11525}{864} ε^{3} + O (ε^{4}) . \end{matrix}$

The expression ( $□_{λ}$ ) in our case is $□ (X Y | Z U) + λ (Δ (X, Z | U) + Δ (Y, Z | U)) = (- 1 + 2 λ \log (3) + \log (30 ε)) ε + O (ε^{2})$ whose $ε$ -order coefficient tends to $- \infty as ε \to 0$ for any fixed $λ$ . Hence, every unconditional version of (2.5) can be violated on our curve of distributions, which proves essential conditionality.

5 Remarks

(1) The distribution constructed in Section 3 satisfies the four CI statements in $L$ and none other. This can be checked computationally but it also follows from Section 2.3 since every superset of $L$ implies the Ingleton inequality.

(2) The entropy vector of that distribution is a conic combination of twelve extreme rays of $H_{4}$ (corresponding to the twelve coatoms in the lattice of semimatroids above $L$ ; cf. [Citation15]). The only ray which violates the Ingleton inequality is not entropic. Thus, our construction gives an entropic conic combination of these not necessarily entropic polymatroids where the non-Ingleton component has sufficiently high weight.

(3) All counterexamples to potential conditional Ingleton inequalities with inclusion-minimal assumptions [Citation19, Section IV.B] as well as all proofs of essential conditionality [Citation7, Section IV.A] require only rational binary distributions. This is remarkable insofar as there exist CI inference rules which are valid for binary random vectors but not in general; see [Citation13]. Whether every wrong CI inference rule can be refuted by a rational distribution is equivalent to [Citation10, Conjecture] and still open.

(4) The method of [Citation13] to construct binary distributions with prescribed CI structure using the Fourier–Stieltjes transform even produces distributions close to the uniform distribution. This allows one to concentrate on satisfying the CI equations only, because every binary tensor close to the uniform distribution has strictly positive entries and thus yields a positive probability distribution after multiplying all entries by a normalizing constant. Matúš’s parameterization of the model $L_{2}$ depends on a solution to the associated solvability system whose components appear as exponents of the parameters. The smallest integral solution to the solvability system is $(x_{12}, x_{13}, x_{14}, x_{23}, x_{24}, x_{34}) = (1, 2, 1, 1, 2, 1)$ ; see [Citation13, Theorem 1] for details. In the nomenclature of this theorem (and its proof), the non-Ingleton score is then given by $(γ^{2} + 1) \log (γ^{2} + 1) + \frac{1}{2} (γ - 1) \log (γ - 1) - (γ^{2} - 1) \log (γ^{2} - 1) - \frac{1}{2} (γ + 1) \log (γ + 1)$ for $γ$ small but positive. This function in $γ$ has one root in the interval $(0, 1)$ where it passes from negative on the left to positive values on the right. The root has the approximate value of $0.72766$ . Using cylindrical algebraic decomposition in Mathematica, it can be verified that Matúš’s construction does not produce tensors with nonnegative entries (ergo probability distributions) if $γ > 0.727$ is imposed. It remains open whether there exist counterexamples to the validity of the Ingleton inequality subject to $L$ and arbitrarily close to uniform or even just without zero entries.

(5) The same method applies to the search for a proof of essential conditionality in Sections 4 because the CI assumptions $[X ⊥ ⊥ Z | U] \land [Y ⊥ ⊥ Z | U]$ have conditioning sets of size one. Moreover, this statistical model has a rational parameterization: its conditionals with respect to $U$ belong to the marginal independence model $[X ⊥ ⊥ Z] \land [Y ⊥ ⊥ Z]$ which has a monomial parameterization in Möbius coordinates by [Citation2]. Lastly, the entropy vectors arising from those distributions in the marginal independence model which have no private information have been completely characterized by [Citation11]. The random search carried out in Section 4 found a counterexample more quickly than any of these approaches.

(6) Combinatorial and group-theoretic constructions of distributions with large violations of the Ingleton inequality have been investigated in [Citation1] in the context of the four-atom conjecture, which was then refuted in [Citation14].

(7) The last part of Open Question 2 in [Citation19] concerns validity of (2.1)–(2.5) for almost-entropic points. As mentioned in Remark 4.1 some cases are settled in [Citation7] with different answers. The status of (2.2) and of (2.5) is open.

Acknowledgments

I thank the anonymous referee for hints for improving the presentation. I would also like to thank Mima Stanojkovski and Rosa Winter for their immediate interest, code samples and an inspiring discussion about finding rational points on varieties—even though the brute force approach turned out to succeed more quickly this time.

References

Boston, N., Nan, T.-T. (2012). Large violations of the Ingleton inequality. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1588–1593. DOI: 10.1109/Allerton.2012.6483410.
Google Scholar
Boege, T., Petrović, S., Sturmfels, B. (2022). Marginal independence models. In: Proceedings of the 2022 International Symposium on Symbolic and Algebraic Computation, ISSAC ’22. Association for Computing Machinery (ACM), pp. 263–271. DOI: 10.1145/3476446.3536193.
Google Scholar
Chan, T. H. (2001). A combinatorial approach to information inequalities. Commun. Inf. Syst. 1(3): 241–253. DOI: 10.4310/CIS.2001.v1.n3.a1.
Google Scholar
Fujishige, S. (1978). Polymatroidal dependence structure of a set of random variables. Inf. Control 39(1): 55–72. DOI: 10.1016/S0019-9958(78)91063-X.
Google Scholar
Gómez, A., Mejía, C., Montoya, J. A. (2017). Defining the almost-entropic regions by algebraic inequalities. Int. J. Inf. Coding Theory 4(1): 1–18. DOI: 10.1504/IJICOT.2017.081456.
Google Scholar
Ingleton, A. W. (1971). Representation of matroids. In: Dominic, J. A., Welsh, ed. Combinatorial Mathematics and its Applications. Proceedings of a Conference held at the Mathematical Institute, Oxford, from 7–10 July, 1969, pp. 149–167.
Google Scholar
Kaced, T., Romashchenko, A. (2013). Conditional information inequalities for entropic and almost entropic points. IEEE Trans. Inf. Theory 59(11): 7149–7167. DOI: 10.1109/TIT.2013.2274614.
Web of Science ®Google Scholar
Matúš, F. (1995). Conditional independences among four random variables. II. Combin. Probab. Comput. 4(4): 407–417. DOI: 10.1017/S0963548300001747.
Google Scholar
Matúš, F. (1997). Conditional independence structures examined via minors. Ann. Math. Artif. Intell. 21(1): 30–99. DOI: 10.1023/A:1018957117081.
Web of Science ®Google Scholar
Matúš, F. (1999). Conditional independences among four random variables. III. Final conclusion. Combin. Probab. Comput. 8(3): 269–276. DOI: 10.1017/S0963548399003740.
Web of Science ®Google Scholar
Matús, F. (2006). Piecewise linear conditional information inequality. IEEE Trans. Inf. Theory 52(1): 236–238. DOI: 10.1109/TIT.2005.860438.
Web of Science ®Google Scholar
Matúš, F. (2007). Infinitely many information inequalities. In: Proceedings of the IEEE ISIT 2007, pp. 41–44.
Google Scholar
Matúš, F. (2018). On patterns of conditional independences and covariance signs among binary variables. Acta Math. Hung. 154(2): 511–524. DOI: 10.1007/s10474-018-0799-6.
Web of Science ®Google Scholar
Matúš, F., Csirmaz, L. (2016). Entropy region and convolution. IEEE Trans. Inf. Theory 62(11): 6007–6018. DOI: 10.1109/TIT.2016.2601598.
Web of Science ®Google Scholar
Matúš, F., Studený, M. (1995). Conditional independences among four random variables. I. Combin. Probab. Comput. 4(3): 269–278. DOI: 10.1017/S0963548300001644.
Google Scholar
Macintyre, A. J., Wilkie, A. J. (1996). On the decidability of the real exponential field. In: Odifreddi, P., ed. Kreiseliana: About and Around Georg Kreisel. Natick, MA: A. K. Peters, pp. 441–467.
Google Scholar
Shannon, C. E. (1948). A mathematical theory of communication. Bell Syst. Tech. J. 27:379–423, 623–656. DOI: 10.1002/j.1538-7305.1948.tb01338.x. 10.1002/j.1538-7305.1948.tb00917.x
Google Scholar
Sturmfels, B. (1996). Gröbner bases and Convex Polytopes, Vol. 8. Providence, RI: American Mathematical Society. DOI: 10.1090/ulect/008.
Google Scholar
Studený, M. (2021). Conditional independence structures over four discrete random variables revisited: conditional Ingleton inequalities. IEEE Trans. Inf. Theory 67(11): 7030–7049. DOI: 10.1109/TIT.2021.3104250.
Web of Science ®Google Scholar
Sullivant, S. (2018). Algebraic Statistics, vol. 194 of Graduate Studies in Mathematics. Providence, RI: American Mathematical Society. DOI: 10.1090/gsm/194.
Google Scholar
Zhang, Z., Yeung, R. W. (1997). A non-shannon-type conditional inequality of information quantities. IEEE Trans. Inf. Theory, 43(6): 1982–1986. DOI: 10.1109/18.641561.
Web of Science ®Google Scholar

Mathematical Software

4ti2 team: 4ti2—a software package for algebraic, geometric and combinatorial problems on linear spaces. Available at https://4ti2.github.io. Version 1.6.9.
Google Scholar
Biere, A. (2019). CaDiCaL at the SAT Race 2019. In: Heule, M., Järvisalo, M., Suda, M., eds. Proc. of SAT Race 2019–Solver and Benchmark Descriptions, vol. B-2019-1 of Department of Computer Science Series of Publications B, pp. 8–9. University of Helsinki. Available at https://github.com/arminbiere/cadical. Version 1.3.1.
Google Scholar
Bruns, W., Ichim, B., Söger, C., von der Ohe, U. Normaliz. Algorithms for rational cones and affine monoids. Available at https://www.normaliz.uni-osnabrueck.de. Version 3.8.4.
Google Scholar
Grayson, D. R., Stillman, M. E. Macaulay2, a software system for research in algebraic geometry. Available at https://www.math.uiuc.edu/Macaulay2/. Version 1.16.
Google Scholar
Wolfram Research, Inc.: Mathematica. Champaign, IL, 2018. Version 11.3.
Google Scholar

No Eleventh Conditional Ingleton Inequality

Abstract

1 Summary

2 On conditional Ingleton inequalities

2.1 Ingleton inequality and entropy region

2.2 Discrete representability of CI structures

2.3 Masks and conditional Ingleton inequalities

3 Construction of the distribution

3.1 Circuits, masks and scores

3.2 Parametrization of $L_{1}$

3.3 Numerical optimization and a rational point

4 Classification of essentially conditional Ingleton inequalities

4.1 Essential conditionality

4.2 Sampling for a counterexample

5 Remarks

Acknowledgments

References

Mathematical Software

Information for

Open access

Opportunities

Help and information

No Eleventh Conditional Ingleton Inequality

Abstract

1 Summary

2 On conditional Ingleton inequalities

2.1 Ingleton inequality and entropy region

2.2 Discrete representability of CI structures

2.3 Masks and conditional Ingleton inequalities

3 Construction of the distribution

3.1 Circuits, masks and scores

3.2 Parametrization of L1

3.3 Numerical optimization and a rational point

4 Classification of essentially conditional Ingleton inequalities

4.1 Essential conditionality

4.2 Sampling for a counterexample

5 Remarks

Acknowledgments

References

Mathematical Software

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

3.2 Parametrization of $L_{1}$