Full article: Bayesian Analysis on a Noncentral Fisher

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Fisher succeeded early on in redefining Student’s t-distribution in geometrical terms on a central hypersphere. Intriguingly, a noncentral analytical extension for this fundamental Fisher–Student’s central hypersphere h-distribution does not exist. We therefore set to derive the noncentral h-distribution and use it to graphically illustrate the limitations of the Neyman–Pearson null hypothesis significance testing framework and the strengths of the Bayesian statistical hypothesis analysis framework on the hypersphere polar axis, a compact nontrivial one-dimensional parameter space. Using a geometrically meaningful maximal entropy prior, we requalify the apparent failure of an important psychological science reproducibility project. We proceed to show that the Bayes factor appropriately models the two-sample t-test p-value density of a gene expression profile produced by the high-throughput genomic-scale microarray technology, and provides a simple expression for a local false discovery rate addressing the multiple hypothesis testing problem brought about by such a technology.

KEYWORDS:

1. Introduction

The statistical analysis literature is replete with words of caution on the use and misuse of various statistical hypothesis testing methods, the contrasted discourse about a much used but heavily criticized Neyman–Pearson null hypothesis significance testing (NHST) framework and a powerful but yet to be fully embraced Bayesian hypothesis testing framework being particularly notorious. See, for example, Greenland et al. (Citation2016), Rothman (Citation2016), Wasserstein and Lazar (Citation2016), and references therein for an all encompassing scope. See also Goodman (Citation1999a, Citation1999b) for a more restricted point of view on medical statistics. Our contribution to this debate is meant to be modest and focused: we shall recast the classical two-sample t-test in an intuitive geometrical setting so as to allow the reader to assess the respective strengths and weaknesses of the NHST and Bayesian frameworks through informative graphical representations on a compact nontrivial one-dimensional parameter space. The practical examples chosen will be drawn from the biomedical research realm which is undergoing major conceptual shifts in data analysis, partly in response to the creation of genomic-scale high-throughput technologies, partly in response to a science reproducibility problem.

Many authors have commented on the elegance and simplicity of geometrical approaches to statistics. In their paper entitled ‘The geometry of estimation’, Durbin and Kendall (Citation1951) state:

“In the ultimate analysis geometrical ‘proofs’ in more than three dimensions are only restatements of analytical results in a special language; but they are nevertheless very useful, partly because of their elegance and partly because they carry a greater degree of conviction and understanding, to some minds at least, than the analytical approach. They also suggest generalizations (...).”

Sir Ronald Fisher was among those advocating early on a geometrical approach to statistics. Historical accounts narrate how he cleanly redefined Student’s t-distribution on a hypersphere (Gorroochurn Citation2016). It is therefore intriguing that one cannot find any trace in the literature of a noncentral analytical extension for the fundamental Fisher–Student’s central hypersphere distribution, as such an extension could ease illustration of the strengths of the Bayesian framework on the hypersphere polar axis, a compact nontrivial one-dimensional parametric space. Instead, one still has to rely on an unwieldy noncentral t-distribution which has obfuscated such an endeavour till now.

The geometrical framework herein advocated will allow us to: reexpress Student’s t-distribution as the Fisher–Student’s central hypersphere h-distribution, derive its analytical noncentral extension, and compare the analytical noncentral h-distribution to the unwieldy noncentral t-distribution in Section 2; graphically compare over compact domains the NHST and Bayesian hypothesis testing frameworks in Sections 3 and 4; explore graphically how the Bayesian framework interprets the apparent failure of an important psychological science reproducibility project, fares when analyzing a gene expression profile microarray dataset, and provides a simple expression for a local false discovery rate addressing the multiple hypothesis testing problem in Section 5.

In order not to distract the reader, our argumentation will mostly consist of statements without extensive proofs. The reader is referred to the Appendices for details.

2. Noncentral Hypersphere h-distribution

Consider an experiment consisting of measuring one continuous outcome in two different experimental conditions, n₁ times for the first condition, n₂ for the second condition. The result of such an experiment can be collated into an observation vector $o = (\begin{matrix} o_{1} \\ o_{2} \end{matrix})$ of length N = n₁ + n₂. The observation vector o can be projected on the overall center-of-mass C, the between-class variance hyperplane B, and the within-class variance hyperplane W using the matrix projectors P_C, P_B, and P_W defined in Appendix A. In both projection and trigonometric terms, the two-sample t-statistic is defined as the signed squared root of the variance ratio (1) $\begin{matrix} t & = & {[ν \frac{o^{t} P_{B} o}{o^{t} P_{W} o}]}^{1 / 2} = \sqrt{ν} \frac{cos θ}{sin θ}, \\ - \infty \leq t \leq \infty, 0 \leq θ \leq π, ν = N - 2 . \end{matrix}$ (1) For the two-sample case, the matrix projector P_B has rank one and can be expanded as P_B = T_BT^t_B, where the relevant eigenvector (2) $T_{B} = \frac{1}{\sqrt{\frac{1}{n_{1}} + \frac{1}{n_{2}}}} (\begin{matrix} \frac{1_{n_{1}}}{n_{1}} \\ - \frac{1_{n_{2}}}{n_{2}} \end{matrix}) = \frac{1}{\sqrt{2 n}} (\begin{matrix} 1_{n} \\ - 1_{n} \end{matrix}) if n_{1} = n_{2} = n,$ (2) allows computation to within a constant of the difference of means between the two experimental conditions. In the following, T_B will be referred to as the unit polar axis of the hypersphere under consideration. When the random observation vector o distributes according to a maximal entropy equiprobability distribution on the unit radius hypersphere $S^{ν}$ —arguably one of the most important continuous distributions in probability theory—its projection cos θ = T^t_Bo on the polar axis distributes according to the Fisher–Student’s central h-distribution (3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) (Fisher et al. Citation1925). This distribution will stand hereafter for the null hypothesis H_o. The central h-distribution is symmetrical upon reflection across the equatorial midline at cos θ = 0, and its width narrows as ν increases, an intrinsic geometrical property of high-dimensional hyperspheres which pack most of their surface on their equatorial bulge. The null index in the parameter set (ν, δ = 0) refers to the fact that, for a central distribution, the noncentrality parameter δ—to be formally introduced below—is zero. Poincaré’s lemma states that ρ_{(ν, 0)}(θ) converges to the normal distribution as ν goes to infinity (Mazliak Citation2015). Computation of the cumulative distribution functions and, consequently, of p-values for the central hypersphere distribution (Equation3(3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) ) can be carried out analytically: see Appendix D.

Figure 1. Translated and noncentral hypersphere distribution. The silhouetted hypersphere $S^{ν = 2}$ is translated by a normalized effect size $δ = 1 / \sqrt{2}$ — re-expressed in geometrical terms as $cos θ_{(ν, δ)} = δ / \sqrt{δ^{2} + ν} ∣_{(ν = 2, δ = 1 / \sqrt{2})} = 1 / \sqrt{5}$ — along the horizontal polar axis. All experimental measurements are expressed in terms of the polar angle θ, which relates to the null hypothesis H_o hypersphere ρ_{(ν, δ = 0)}(θ) centered on the origin. The angle Θ for its part relates to measurements which can be made in the intrinsic reference system centered on the round dashed translated hypersphere. As the angle Θ rotates from 0 to π, the translated symmetrical hypersphere is mapped vertically above the observation’s projection cos θ_o on the unsymmetrical noncentral h-distribution $ρ_{(ν = 2, δ = 1 / \sqrt{2})}^{h} (θ)$ of interest.

Figure 1. Translated and noncentral hypersphere distribution. The silhouetted hypersphere Sν=2 is translated by a normalized effect size δ=1/2 — re-expressed in geometrical terms as cosθ(ν,δ)=δ/δ2+ν∣(ν=2,δ=1/2)=1/5 — along the horizontal polar axis. All experimental measurements are expressed in terms of the polar angle θ, which relates to the null hypothesis Ho hypersphere ρ(ν, δ = 0)(θ) centered on the origin. The angle Θ for its part relates to measurements which can be made in the intrinsic reference system centered on the round dashed translated hypersphere. As the angle Θ rotates from 0 to π, the translated symmetrical hypersphere is mapped vertically above the observation’s projection cos θo on the unsymmetrical noncentral h-distribution ρ(ν=2,δ=1/2)h(θ) of interest.

We now extend the Fisher–Student’s central h-distribution (Equation3(3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) ) to that of a noncentral h-distribution with center-of-mass shifted away from the origin of the referential system by the value of the noncentrality parameter δ. In geometric terms, the t-like noncentrality parameter δ defines the angle $cot θ_{(ν, δ)} = δ / \sqrt{ν}$ which imparts to the hypersphere the translation given by (4) $cos θ_{(ν, δ)} = \frac{δ / \sqrt{ν}}{\sqrt{{(δ / \sqrt{ν})}^{2} + 1}} = \frac{δ}{\sqrt{δ^{2} + ν}}$ (4) on the finite cosine range − 1 ⩽ cos θ ⩽ 1 along the polar axis T_B defined in equation (Equation2(2) $T_{B} = \frac{1}{\sqrt{\frac{1}{n_{1}} + \frac{1}{n_{2}}}} (\begin{matrix} \frac{1_{n_{1}}}{n_{1}} \\ - \frac{1_{n_{2}}}{n_{2}} \end{matrix}) = \frac{1}{\sqrt{2 n}} (\begin{matrix} 1_{n} \\ - 1_{n} \end{matrix}) if n_{1} = n_{2} = n,$ (2) ). As represented in , angular measurements are affected by the translation. In the referential system centered on the translated hypersphere, the cotangent of the polar angle is given by $cot Θ_{(ν, δ)} (θ) = \frac{cos θ - cos θ_{(ν, δ)}}{sin θ}$ reminiscent of the z-score (x − μ)/σ for a normal distribution with non-vanishing mean μ, from which one computes the transformation angle (5) $Θ_{(ν, δ)} (θ) = arccot (\frac{cos θ - cos θ_{(ν, δ)}}{sin θ}) .$ (5) In the translated referential system, the central h-distribution provided by equation (Equation3(3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) ) holds, not unlike the shape of a normal distribution left unchanged by a simple translation as graphically represented in . In the original referential system, the noncentral h-distribution transforms according to the change of variable (Equation5(5) $Θ_{(ν, δ)} (θ) = arccot (\frac{cos θ - cos θ_{(ν, δ)}}{sin θ}) .$ (5) ): one finds (6) $\begin{matrix} ρ_{(ν, δ)}^{h} (θ) & = & ρ_{(ν, 0)} (Θ_{(ν, δ)} (θ)) \frac{d Θ_{(ν, δ)} (θ)}{d θ} \\ = & \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} Θ_{(ν, δ)} (θ) J_{(ν, δ)} (θ), \end{matrix}$ (6) where the transformation’s Jacobian is given by $J_{(ν, δ)} (θ) = \frac{d Θ_{(ν, δ)} (θ)}{d θ} = \frac{1 - cos θ cos θ_{(ν, δ)}}{[1 - 2 cos θ cos θ_{(ν, δ)} + {cos}^{2} θ_{(ν, δ)}]} .$ The various terms of the h-distribution (Equation6(6) $\begin{matrix} ρ_{(ν, δ)}^{h} (θ) & = & ρ_{(ν, 0)} (Θ_{(ν, δ)} (θ)) \frac{d Θ_{(ν, δ)} (θ)}{d θ} \\ = & \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} Θ_{(ν, δ)} (θ) J_{(ν, δ)} (θ), \end{matrix}$ (6) ) can be regrouped such that the noncentral h-distribution simply reads (7) $ρ_{(ν, δ)}^{h} (θ) = L_{(ν, δ)}^{h} (θ) ρ_{(ν, 0)} (θ),$ (7) where the multiplicative function $L_{(ν, δ)}^{h} (θ)$ —readily interpreted as the likelihood ratio between the noncentral ρ^h_{(ν, δ)} and central ρ_{(ν, 0)} h-distributions—is given by (8) $L_{(ν, δ)}^{h} (θ) = \frac{1 - cos θ cos θ_{(ν, δ)}}{{[1 - 2 cos θ cos θ_{(ν, δ)} + {cos}^{2} θ_{(ν, δ)}]}^{\frac{ν + 1}{2}}} .$ (8) Since the hypersphere dimensional parameter ν is usually prespecified by the experimental setup, the noncentral h-distribution is essentially parameterized by the noncentrality parameter δ as reexpressed in geometrical terms in Equation (Equation4(4) $cos θ_{(ν, δ)} = \frac{δ / \sqrt{ν}}{\sqrt{{(δ / \sqrt{ν})}^{2} + 1}} = \frac{δ}{\sqrt{δ^{2} + ν}}$ (4) ). Compare with Gönen et al. (Citation2005) and Wang and Liu (Citation2016). When δ = 0, cos θ_{(ν, δ)} = 0, and ρ^h_{(ν, δ = 0)}(θ) simplifies to the Fisher–Student’s central h-distribution ρ_{(ν, 0)}(θ), as expected. From its definition in Equation (Equation6(6) $\begin{matrix} ρ_{(ν, δ)}^{h} (θ) & = & ρ_{(ν, 0)} (Θ_{(ν, δ)} (θ)) \frac{d Θ_{(ν, δ)} (θ)}{d θ} \\ = & \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} Θ_{(ν, δ)} (θ) J_{(ν, δ)} (θ), \end{matrix}$ (6) ), integration of the noncentral h-distribution ρ^h_{(ν, δ)}(θ) between any integration bounds is given by $\int_{θ_{1}}^{θ_{2}} ρ_{(ν, δ)}^{h} (θ) d θ = \int_{Θ_{(ν, δ)} (θ_{1})}^{Θ_{(ν, δ)} (θ_{2})} ρ_{(ν, 0)} (Θ) d Θ,$ that is, integration of the noncentral h-distribution between the integration bounds [θ₁, θ₂] simply boils down to integration of Student’s distribution (Equation3(3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) ) between the transformed integration bounds [Θ_{(ν, δ)}(θ₁), Θ_{(ν, δ)}(θ₂)] provided by Equation (Equation5(5) $Θ_{(ν, δ)} (θ) = arccot (\frac{cos θ - cos θ_{(ν, δ)}}{sin θ}) .$ (5) ). All the relevant integrations have been carried out analytically and the results are provided in Appendix D. For the sake of completeness, we state without proof that the noncentral t-distribution function—formally derived through introduction of the ratio of a random variable distributing according to a normal distribution $N (δ, 1)$ over a random variable distributing according to a χ_ν distribution—can similarly be rewritten $ρ_{(ν, δ)}^{t} (θ) = L_{(ν, δ)}^{t} (θ) ρ_{(ν, 0)} (θ), 0 \leq θ \leq π,$ where the likelihood ratio $L_{(ν, δ)}^{t} (θ)$ is given by $\begin{matrix} L_{(ν, δ)}^{t} (θ) & = & \frac{e^{- \frac{δ^{2}}{2}}}{Γ (\frac{ν + 1}{2})} \int_{0}^{\infty} e^{- v} e^{(\sqrt{2} δ cos θ) v^{\frac{1}{2}}} v^{\frac{ν - 1}{2}} d v \\ = & e^{- \frac{δ^{2}}{2}} \sum_{j = 0}^{\infty} \frac{Γ (\frac{j + ν + 1}{2})}{Γ (\frac{ν + 1}{2})} \frac{{(\sqrt{2} δ cos θ)}^{j}}{j!} \end{matrix}$ after expansion of the exponential and use of the gamma function definition. The likelihood ratio $L_{(ν, δ)}^{t} (θ)$ thus involves a cumbersome sum over an infinite number of terms. When δ = 0, $L_{(ν, 0)}^{t} (θ) = 1,$ and the noncentral t-distribution ρ^t_{(ν, δ)}(θ) simplifies to on the Fisher–Student’s central h-distribution ρ_{(ν, 0)}(θ), as expected.

To summarize, the noncentral h-distribution ρ^h_{(ν, δ)}(θ) is obtained by translating the Fisher–Student’s central h-distribution (Equation3(3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) ) along the polar axis T_B, while the noncentral t-distribution ρ^t_{(ν, δ)}(θ) is obtained as the marginal of the joint probability of a normal distribution $N (δ, 1)$ times a χ²_ν distribution: the noncentral h-distribution is thus both conceptually and analytically simpler. Furthermore, while both noncentral h- and t-distributions simplify to the Fisher–Student’s central h-distribution when δ = 0, it can be graphically ascertained that, for a nonvanishing value of δ, ρ^h_{(ν, δ)}(θ) and ρ^t_{(ν, δ)}(θ) are almost superposable, as graphically demonstrated in . We are thus justified to use the closed analytical form (Equation7(7) $ρ_{(ν, δ)}^{h} (θ) = L_{(ν, δ)}^{h} (θ) ρ_{(ν, 0)} (θ),$ (7) ) for noncentral h-distribution ρ^h_{(ν, δ)}(θ)—with its analytical and readily factorized likelihood ratio (Equation8(8) $L_{(ν, δ)}^{h} (θ) = \frac{1 - cos θ cos θ_{(ν, δ)}}{{[1 - 2 cos θ cos θ_{(ν, δ)} + {cos}^{2} θ_{(ν, δ)}]}^{\frac{ν + 1}{2}}} .$ (8) )—in order to exploit the Bayesian framework to its fullest in the following. Finally, a most instructive generalization to the noncentral F-distribution is to be found in Appendix C.

Figure 2. Noncentral t-distributions on the left and h-distributions on the right for a varying number of degrees of freedom—or, equivalently, hypersphere dimension—ν which can be deduced from the grayscale colorbars. The Fisher–Student’s central hypersphere distribution is given in duplicate in the upper panels. The likelihood ratios $L_{(ν, δ)}^{t} (θ)$ and $L_{(ν, δ)}^{h} (θ)$ for δ = 1 are given in the middle left and right panels, respectively. The noncentral ρ^t_{(ν, δ)}(θ) and ρ^h_{(ν, δ)}(θ) distributions, products of the two functions above them, are given in the lower left and right panels, respectively. The resulting distributions are almost superposable.

3. Null Hypothesis Significance Testing Framework

We shall refer in the following to δ as the normalized effect size, and to Δ as the sampling distribution noncentrality parameter. The latter is determined in Appendix B in geometric terms. We shall abide with the convention of using the Greek letters δ or Δ to designate the distributions’ noncentrality parameters, and the roman letters d or D to designate their respective estimates (Cumming and Finch Citation2001).

The Neyman–Pearson NHST framework calls for rejection of the null hypothesis δ = 0 whenever the observed p-value established with respect to the central hypersphere distribution ρ_{(ν, 0)}(θ) is less than or equal to a pre-chosen Type I error (false positive) level α. For the hypersphere distribution of interest, the NHST prescription requires specification of the statistical test specificity at Type I error (false positive) level α $specificity = 1 - α = 1 - \int_{0}^{θ_{α}} ρ_{(ν, 0)} (θ) d θ = 1 - p (θ_{α})$ in terms of the central h-distribution (Equation3(3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) ), equation which determines the one-tail statistical critical angle $cot θ_{α} = t_{α} / \sqrt{ν},$ reciprocally related to the sensitivity (power to detect the effect size) at type II error (false negative) level β $\begin{matrix} sensitivity & = & 1 - β = \int_{0}^{θ_{α}} ρ_{(ν, Δ)}^{h} (θ) d θ \\ = & \int_{0}^{Θ_{(ν, Δ)} (θ_{α})} ρ_{(ν, 0)} (Θ) d Θ = p (Θ_{(ν, Δ)} (θ_{α})) \end{matrix}$ in terms of the noncentral h-distribution (Equation7(7) $ρ_{(ν, δ)}^{h} (θ) = L_{(ν, δ)}^{h} (θ) ρ_{(ν, 0)} (θ),$ (7) ). graphically summarizes these concepts on the finite polar axis. In the frequentist framework, the t-statistic (Equation1(1) $\begin{matrix} t & = & {[ν \frac{o^{t} P_{B} o}{o^{t} P_{W} o}]}^{1 / 2} = \sqrt{ν} \frac{cos θ}{sin θ}, \\ - \infty \leq t \leq \infty, 0 \leq θ \leq π, ν = N - 2 . \end{matrix}$ (1) ) provides a maximum likelihood estimate D for Δ. Similarly, the correlation-like projection cos θ = T^t_Bo provides a maximum likelihood estimate for (9) $cos θ_{(ν, Δ)} = \frac{Δ / \sqrt{ν}}{\sqrt{{(Δ / \sqrt{ν})}^{2} + 1}} = \frac{Δ}{\sqrt{Δ^{2} + ν}} .$ (9) See Appendix B for various relationships between Δ, δ, ν and the two-sample sizes (n₁, n₂). In order to compute a confidence interval (CI) for the estimate cos θ_{(ν, D)}, one needs to be able to express θ in terms of Θ. Solving a quadratic equation, one finds $cos θ (Θ) = {sin}^{2} Θ cos θ_{(ν, D)} \pm cos Θ \sqrt{1 - {sin}^{2} Θ {cos}^{2} θ_{(ν, D)}} .$ Using the latter, the two-tail confidence interval for cos θ_{(ν, D)} is found to be given by (10) $\begin{matrix} cos θ_{(ν, D)} {CI}_{1 - α} & = & {sin}^{2} θ_{α / 2} cos θ_{(ν, D)} \\ \pm cos θ_{α / 2} \sqrt{1 - {sin}^{2} θ_{α / 2} {cos}^{2} θ_{(ν, D)}}, \end{matrix}$ (10) which simplifies to the expected ± cos θ_α/2 when cos θ_{(ν, D)} = 0. This confidence interval is verified to concur with the usual noncentral t-distribution effect size confidence interval definition (Cumming and Finch Citation2001). We have plotted in the upper panel of the frequentist two-tail confidence interval cos θ_{(ν, D)} CI_{1 − α} for the continuum of estimates cos θ_{(ν, D)} at confidence level α = .05 and for various two-sample equal size n₁ = n₂ = n. Note that the null hypothesis H_o neatly stands on the vertical line at cos θ = 0. The lower panel in the same figure provides the corresponding statistical test sensitivity (power). It can be readily verified that lower bound of the confidence interval for cos θ_{(ν, D)} is still negative when the latter reaches the critical angle cos θ_α/2. Thus, when applied to the central and noncentral h-distributions, the Neyman–Pearson prescription fails to produce a significant confidence interval at the critical p-value. In fact, Equation (Equation10(10) $\begin{matrix} cos θ_{(ν, D)} {CI}_{1 - α} & = & {sin}^{2} θ_{α / 2} cos θ_{(ν, D)} \\ \pm cos θ_{α / 2} \sqrt{1 - {sin}^{2} θ_{α / 2} {cos}^{2} θ_{(ν, D)}}, \end{matrix}$ (10) ) indicates that such a significant confidence interval is achieved at a greater estimate $cos θ_{(ν, D)} = cot θ_{α / 2} > cos θ_{α / 2}$ —or, equivalently, lesser $p -$ value—again as illustrated in . Now recall that, at confidence level α, the frequentist confidence interval ${CI}_{1 - α}$ is defined such that, on repeated samplings, 1 − α such intervals are expected to contain the true population parameter cos θ_{(ν, Δ)}. For an experiment that barely crosses the significance threshold, the estimates on repeat samplings will fall on either side of the threshold, with $p -$ values and confidence intervals declared significant or not, accordingly. When subjected to such statistical fluctuations, statistical significance thus becomes a “fickle” notion (Nuzzo Citation2014; Colquhoun Citation2014; Burnham and Anderson Citation2014; Halsey et al. Citation2015; Wasserstein and Lazar Citation2016).

Figure 3. Null hypothesis significance testing. Type I error at level α and power 1 − β for type II error at level β are represented as intercepts of the cumulative distribution functions of the central and noncentral h-distributions with the critical line at cos θ_α, respectively. Abbreviations: (n)chpdf — (non)central hypersphere probability distribution function; (n)chcdf — (non)central hypersphere cumulative distribution function.

Figure 4. Two-tail frequentist confidence intervals at α = .05 for the continuum of two-sample, equal size n₁ = n₂ = n, cos θ_{(ν, D)} estimates are plotted in the upper panel. The null hypothesis H_o neatly stands on the vertical line at cos θ_{(ν, D)} = 0, from which one can draw critical values horizontally to the diagonal first and vertically to the horizontal polar axis thereafter. The frequentist CI lower bound is still negative when the estimate cos θ_{(ν, D)} reaches the critical angle cos θ_α/2, as exemplified by the inner wedge. The Neyman–Pearson prescription thus fails to produce a significant confidence interval at the critical p-value when applied to the central and noncentral h-distributions. The CI lower bound crosses the zero threshold upward at a larger estimate $cos θ_{(ν, D)} = cot θ_{α / 2} > cos θ_{α / 2}$ —or, equivalently, lesser $p -$ value—as exemplified by the outer wedge. For an experiment that barely crosses the significance threshold, the estimates on repeat samplings will fall on either side of the threshold, with $p -$ values and confidence intervals declared significant or not, accordingly. When subjected to such statistical fluctuations, statistical significance thus becomes a “fickle” notion. Lower panel: corresponding power curves.

Figure 4. Two-tail frequentist confidence intervals at α = .05 for the continuum of two-sample, equal size n1 = n2 = n, cos θ(ν, D) estimates are plotted in the upper panel. The null hypothesis Ho neatly stands on the vertical line at cos θ(ν, D) = 0, from which one can draw critical values horizontally to the diagonal first and vertically to the horizontal polar axis thereafter. The frequentist CI lower bound is still negative when the estimate cos θ(ν, D) reaches the critical angle cos θα/2, as exemplified by the inner wedge. The Neyman–Pearson prescription thus fails to produce a significant confidence interval at the critical p-value when applied to the central and noncentral h-distributions. The CI lower bound crosses the zero threshold upward at a larger estimate cosθ(ν,D)=cotθα/2>cosθα/2—or, equivalently, lesser p-value—as exemplified by the outer wedge. For an experiment that barely crosses the significance threshold, the estimates on repeat samplings will fall on either side of the threshold, with p-values and confidence intervals declared significant or not, accordingly. When subjected to such statistical fluctuations, statistical significance thus becomes a “fickle” notion. Lower panel: corresponding power curves.

4. Bayesian Hypothesis Testing Framework

In this section, the Bayesian hypothesis testing framework will be applied to the noncentral h-distribution (Equation7(7) $ρ_{(ν, δ)}^{h} (θ) = L_{(ν, δ)}^{h} (θ) ρ_{(ν, 0)} (θ),$ (7) ) considered as a δ-parameterized continuum of hypotheses H₁, with the noncentral parameter δ re-expressed in geometric terms in equation (Equation4(4) $cos θ_{(ν, δ)} = \frac{δ / \sqrt{ν}}{\sqrt{{(δ / \sqrt{ν})}^{2} + 1}} = \frac{δ}{\sqrt{δ^{2} + ν}}$ (4) ). The Fisher–Student’s central h-distribution (Equation3(3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) ) will considered as H₀. In a Bayesian model selection framework, one is interested in the Bayes factor defined as the ratio of conditional probabilities (11) $BF = \frac{P (D | H_{1})}{P (D | H_{0})},$ (11) where $D$ is the data observed. Since the central distribution (Equation3(3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) ) is easily factored out of the analytical expression for the noncentral hypersphere distribution (Equation7(7) $ρ_{(ν, δ)}^{h} (θ) = L_{(ν, δ)}^{h} (θ) ρ_{(ν, 0)} (θ),$ (7) ), the Bayes factor of interest can be (numerically) computed via the integral (12) $\begin{matrix} {BF}_{ν} (θ) & = & \int_{0}^{π} L_{(ν, \sqrt{ν} cot θ_{1})}^{h} (θ) \Pr_{ν} (θ_{1}) d θ_{1} \\ = & \int_{0}^{π} \frac{1 - cos θ_{1} cos θ}{{[1 - 2 cos θ_{1} cos θ + {cos}^{2} θ_{1}]}^{\frac{ν + 1}{2}}} \Pr_{ν} (θ_{1}) d θ_{1}, \end{matrix}$ (12) in terms of the likelihood ratio $L_{(ν, \sqrt{ν} cot θ_{1})}^{h} (θ)$ defined in (Equation8(8) $L_{(ν, δ)}^{h} (θ) = \frac{1 - cos θ cos θ_{(ν, δ)}}{{[1 - 2 cos θ cos θ_{(ν, δ)} + {cos}^{2} θ_{(ν, δ)}]}^{\frac{ν + 1}{2}}} .$ (8) ) and of a prior Pr_ν(θ₁) to be specified. The relevant posterior probability will then be given by (13) $P_{ν} (cos θ_{1} | cos θ) = {BF}_{ν}^{- 1} (θ) L_{(ν, \sqrt{ν} cot θ_{1})}^{h} (θ) \Pr_{ν} (θ_{1}) .$ (13) It has been abundantly argued in the literature that the choice of prior can have a definitive influence on the Bayes factors and posterior probabilities. Instead of debating the relative merits of various kind of priors, we shall restrain our attention to the following two priors: (1) the proper geometry-naive uniform prior Pr_ν(θ₁) = 1 which should reproduce results stemming from the frequentist framework, and (2) the proper maximal entropy (maxent) prior Pr_ν(θ₁) = ρ_{(ν, 0)}(θ₁) specified by the Fisher–Student’s central h-distribution (Equation3(3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) ) itself (Jaynes Citation1968). The latter choice of maxent prior naturally arises in the present geometrical setting as it meaningfully apportions most of the weight of evidence for H₁ on the equatorial band of a high-dimensional hypersphere where lies the bulk of its density. The latter observation is most pertinent for the biomedical research realm in which “empirical evidence suggests that most medical intervention effects are small or modest” (Pereira, Horwitz, and Ioannidis Citation2012). This maxent prior avoids Bartlett’s and the information paradoxes (Wang and Liu Citation2016). The equal-tail Bayesian credible interval ${CI}_{1 - α}$ for the posteriors is given by the integration limits for the integrand (Equation13(13) $P_{ν} (cos θ_{1} | cos θ) = {BF}_{ν}^{- 1} (θ) L_{(ν, \sqrt{ν} cot θ_{1})}^{h} (θ) \Pr_{ν} (θ_{1}) .$ (13) ) which leave out α/2 of the integrand on each tail. We have plotted in the upper panels of the posteriors P_ν(cos θ₁|cos θ) credible intervals for both priors. As expected, the credible interval for the geometry-naive uniform prior recapitulates the results of the frequentist framework: the credible interval straddles the diagonal without correction to the parameter estimate. More interesting is the effect of the maxent prior (Equation3(3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) ) which conservatively brings down the Bayesian credible interval below the diagonal. Although the Bayesian statistical hypothesis testing framework is not supposed to be discussed in terms of thresholds, it is interesting to note that the lower bounds of the Bayesian credible intervals in the positive upper quadrants of the upper panels cross the zero threshold upward when the Bayes factor reaches about two and six decibans in the left and right lower panels, at which level the Bayesian evidence is declared barely worth mentioning and substantial, respectively, according to Jeffreys (Citation1998). Finally, recall that the Bayesian credible interval CI_{1 − α} for the parameter cos θ₁ is defined such that, given the observed data, there is 1 − α chance that the true parameter lies in it.

Figure 5. Left panels: geometry-naive uniform prior. Right panels: central hypersphere maxent prior. Upper panels: Bayesian credible intervals CI_0.95 for the posterior P_ν(cos θ₁|cos θ), together with their 0.5 quantiles. Lower panels: corresponding Bayes factors in decibans. The number of degrees of freedom or, equivalently, the hypersphere dimension ν can be deduced from the grayscale colorbars. The Bayesian credible intervals for the geometry-naive uniform prior recapitulate results of the frequentist framework. The central hypersphere maxent prior conservatively brings down the Bayesian credible intervals below the diagonal. Note that the lower bounds of the Bayesian credible intervals in the positive upper quadrants of the upper panels cross the zero threshold upward when the Bayes factor reaches about two and six decibans in the left and right lower panels, at which level the Bayesian evidence is declared barely worth mentioning and substantial, respectively.

5. Applications

When discussing Bayesian analyses in this section, it shall be understood that the analyses will pertain solely to the maxent prior Pr_ν(θ₁) = ρ_{(ν, 0)}(θ₁) specified by the Fisher–Student’s central h-distribution (Equation3(3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) ).

The Open Science Collaboration project (Collaboration et al. Citation2015) estimated the reproducibility of psychological science to be weak: only 36% of 100 replication attempts produced statistically significant results in terms of the Neyman–Pearson NHST framework p-values, and only 47% of the original effect sizes were within the frequentist CI_0.95 confidence interval of the replication effect size. Etz and Vandekerckhove (Citation2016) argued that the failure of the Reproducibility Project could be attributed to overestimation of the original effect sizes and weak Bayesian evidence in the original studies. Considering the same 72 univariate test-based studies retained by Etz and Vandekerckhove (Citation2016), using the inferential t-statistics they provided in their Supporting Information, and further filtering down—for graphical purposes—to the 60 studies with sample size N less than 220, we have plotted in their original and replication empirical effect sizes against the posteriors’ Bayesian credible intervals. As the posteriors’ Bayesian credible intervals lay under the frequentist diagonal, a now very substantial 75% of the replication effect sizes fall within their respective original effect size posteriors’ credible intervals, indicating that the replication effort was not a failure from a Bayesian perspective. But since 60% of the original credible intervals’ lower bound are found to be negative, it has to be concluded that there was weak Bayesian evidence for the effect sizes in the original studies, in full accord with the conclusions of Etz and Vandekerckhove (Citation2016). shows thus a graphical illustration of the strength of the Bayesian framework: the central h-distribution maxent prior effects an appropriate weighing down of original overestimated effect sizes. In that respect, Ioannidis (Citation2008) has argued that “if priors assume that small effects are plausible but large effects are implausible, Bayes Factors become most promising for small effects,” which is indeed the case here.

Figure 6. The Open Science Collaboration Project original and replication effect sizes plotted against the posteriors’ ${CI}_{0.95}$ Bayesian credible intervals. Since 75% of the replication effect sizes fall within their original posteriors’ credible intervals, the replication project cannot be called a failure from a Bayesian perspective. Nevertheless, 60% of the original credible intervals’ lower bounds are found to be negative, indicating weak Bayesian evidence for the original effects. The sample sizes can be deduced from the grayscale colorbar: the dot color refer to the initial sample size; an upward (downward) pointing triangle indicates that the replication sample size was bigger (smaller) than the original sample size, while a square indicates that the replication and original sample sizes were identical.

The microarray technology allows for interrogation of the cellular expression of thousands of genes (Schulze and Downward Citation2001). For illustrative purposes, we have: accessed the NCBI Gene Expression Omnibus head and neck squamous cell carcinoma dataset GSE6631 (Kuriakose et al. Citation2007b) produced by Kuriakose et al. (Citation2004a) and pertaining to 22 paired samples of normal versus cancerous tissue; extracted the gene probe signals using the Robust Multiarray Analysis (RMA) algorithm (Irizarry et al. Citation2003); and, finally, filtered out 10% of weakly expressed genes with lowest variances. In , the Bayes factor ${BF}_{ν = 42} (p)$ —defined by Equation (Equation12(12) $\begin{matrix} {BF}_{ν} (θ) & = & \int_{0}^{π} L_{(ν, \sqrt{ν} cot θ_{1})}^{h} (θ) \Pr_{ν} (θ_{1}) d θ_{1} \\ = & \int_{0}^{π} \frac{1 - cos θ_{1} cos θ}{{[1 - 2 cos θ_{1} cos θ + {cos}^{2} θ_{1}]}^{\frac{ν + 1}{2}}} \Pr_{ν} (θ_{1}) d θ_{1}, \end{matrix}$ (12) ) but reparameterized in terms of p(θ) as provided by Equation (Equation40(40) $p (θ) = \frac{1}{2} - \frac{cos θ}{2} (1 + \frac{1}{2} {sin}^{2} θ + \dots + \frac{1 \cdot 3 \cdot \dots \cdot (ν_{2} - 3)}{2 \cdot 4 \cdot \dots \cdot (ν_{2} - 2)} {sin}^{ν_{2} - 2} θ),$ (40) ) in Appendix D—is plotted against the empirical p-value density histogram for 11,302 gene differential expression t-tests. It is seen that the Bayes factor fits extremely well the empirical p-value density, both in the middle of the graph where Bayesian evidence favors H_o, and at critical p-values at both ends of the range where Bayesian evidence favors H₁. Recall that the Fisher–Student’s central h-distribution p-value density is the uniform density U(0, 1) on the range 0 ⩽ p ⩽ 1, a restatement of the probability transform theorem which stipulates that the distribution of p-values under the null hypothesis is uniform. On this p-range, the Bayes factor defined in Equation (Equation11(11) $BF = \frac{P (D | H_{1})}{P (D | H_{0})},$ (11) ) simply reads $BF = P (D | H_{1}),$ as the denominator simplifies to $P (D | H_{o}) = 1 .$ More precisely, we prove in Appendix E that the Bayes factor BF_ν(p) is a bona fide probability distribution, modeling the H₁-associated nonuniform p-value density in lieu of the H_o-associated uniform p-value density U(0, 1).

Figure 7. Bayes factor ${BF}_{ν} (p)$ against a microarray empirical p-value density histogram. The Bayes factor ${BF}_{ν = 42} (p)$ is plotted against the p-value density histogram for 11,302 gene differential expression t-tests of the GEO dataset with accession number GSE6631. On the range 0 ⩽ p ⩽ 1, the Fisher–Student’s central h-distribution p-value density simplifies to the uniform density U(0, 1). On this range, the Bayes factor therefore simply reads $BF = P (D | H_{1}) / P (D | H_{0}) = P (D | H_{1}) .$ The Bayes factor fits extremely well the empirical p-value density, both in the middle of the graph where Bayesian evidence favors H_o, and at critical p-values where Bayesian evidence favors H₁.

Since the microarray technology simultaneously interrogates the cellular expression of thousands of genes, one is readily confronted with the multiple hypothesis testing problem within the frequentist NHST framework (Dudoit, Shaffer, and Boldrick Citation2003). Efron (Citation2008) proposed a local false discovery rate (fdr) in terms of a two-group mixture density model of “null genes” and “nonnull genes” to address this problem. The present Bayesian approach affords a very economical definition of a local fdr in terms of the Bayes factor BF_ν(p):(14) $fdr (p) = \frac{P (D | H_{o})}{P (D | H_{o}) + P (D | H_{1})} = \frac{1}{1 + {BF}_{ν} (p)} .$ (14) We have, in the upper panel of plotted the local fdr on a p-value log-scale, while the lower panel can be used to assess the local fdr for vanishing p-values on a log-log scale. It is interesting to note that the curves for the local fdr are independent of the corresponding hypersphere number of degrees of

freedom ν and are essentially linear on a log–log scale in the relevant subdomain of vanishing $p -$ values, a fact which attributes a degree of universality to the present Bayesian hypothesis testing framework based on the noncentral h-distribution (Equation7(7) $ρ_{(ν, δ)}^{h} (θ) = L_{(ν, δ)}^{h} (θ) ρ_{(ν, 0)} (θ),$ (7) ) with the central Fisher–Student’s h-distribution (Equation3(3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) ) as geometrically meaningful maxent prior.

Figure 8. Local false discovery rate $1 / (1 + {BF}_{ν} (p))$ in terms of the Bayes factor ${BF}_{ν} (p (θ)),$ where ν is the hypersphere dimension which can be deduced from the grayscale colorbar. Upper panel: two-tail local fdr. Note the signed log-scale on the horizontal axis. Lower panel: blow-up and extension of the right-hand tail of the upper panel on a log–log scale. The local fdr curves are independent of the hypersphere number of degrees of freedom ν and are essentially linear on a log–log scale in the subdomain covered in the latter panel, a fact which attributes a degree of universality to the Bayesian hypothesis testing framework based on the noncentral h-distribution, with the central Fisher–Student’s h-distribution as geometrically meaningful maxent prior.

6. Conclusion

Using simple geometric concepts such as vectors, matrix projectors and trigonometric quantities, we have derived an analytical noncentral extension to the Fisher–Student’s central hypersphere distribution. Characterized by the single noncentrality parameter δ, this analytical noncentral h-distribution has allowed us to graphically assess the limitations of the Neyman–Pearson null hypothesis significance testing framework and the strengths of the Bayesian hypothesis analysis framework on a nontrivial one-dimensional compact parametric space. The central Fisher–Student’s hypersphere h-distribution has been geometrically argued to be an appropriate maxent prior. The corresponding Bayes factor and posteriors have been demonstrated to remedy in part to the vexing question of the reproducibility of science, by modulating down overestimated size effects that a frequentist analysis would inevitably produce. The Bayes factor has been shown to adequately model the empirical p-value density of a multiple hypothesis testing dataset produced by the microarray technology, and to provide easy assessment of a local false discovery rate. The noncentral h-distribution has thus allowed us to address all the intricacies of both the Neyman–Pearson null hypothesis significance testing framework and the Bayesian hypothesis analysis framework while avoiding use of the unwieldy noncentral t-distribution or problems arising from working in higher-dimensional parametric spaces. As such, the noncentral hypersphere h-distribution has relevance both as a practical tool and as a pedagogical tool for a broad audience.

References

Baharev, A., Schichl, H., and Rév, E. (2017), “Computing the Noncentral-f Distribution and the Power of the F-test with Guaranteed Accuracy,” Computational Statistics 32, 763–779.
Web of Science ®Google Scholar
Burnham, K. P., and Anderson, D. (2014), “P values are Only an Index to Evidence: 20th vs. 21st-century Statistical Science,” Ecology, 95, 627–630.
PubMed Web of Science ®Google Scholar
Chance, W. A. (1986), “A Geometric Derivation of the Distribution of the Correlation Coefficient |r| when ρ= 0,” American Mathematical Monthly, 93, 94–98.
Web of Science ®Google Scholar
Collaboration, O. S. et al. (2015), “Estimating the Reproducibility of Psychological Science,” Science, 349, aac4716.
PubMed Web of Science ®Google Scholar
Colquhoun, D. (2014), “An Investigation of the False Discovery Rate and the Misinterpretation of p-values,” Royal Society Open Science, 1, 140216.
PubMed Web of Science ®Google Scholar
Cumming, G., and Finch, S. (2001), “A Primer on the Understanding, use, and Calculation of Confidence Intervals that are Based on Central and Noncentral Distributions,” Educational and Psychological Measurement, 61, 532–574.
Web of Science ®Google Scholar
Dudoit, S., Shaffer, J. P., and Boldrick, J. C. (2003), “Multiple Hypothesis Testing in Microarray Experiments,” Statistical Science, 18, 71–103.
Web of Science ®Google Scholar
Durbin, J., and Kendall, M. G. (1951), “The Geometry of Estimation,” Biometrika, 38, 150–158.
PubMed Web of Science ®Google Scholar
Efron, B. (2008), “Microarrays, Empirical Bayes and the Two-groups Model,” Statistical Science, 23, 1–22.
Web of Science ®Google Scholar
Ellis, P. D. (2010), The Essential Guide to Effect Sizes: Statistical Power, Meta-analysis, and the Interpretation of Research Results, Cambridge, UK: Cambridge University Press.
Google Scholar
Etz, A., and Vandekerckhove, J. (2016), “A Bayesian Perspective on the Reproducibility Project: Psychology,” PloS One, 11, e0149794.
PubMed Web of Science ®Google Scholar
Fisher, R. A., et al. (1925), “Applications of Student Distribution,” Metron, 5, 90–104.
Google Scholar
Gönen, M., Johnson, W. O., Lu, Y., and Westfall, P. H. (2005), “The Bayesian Two-Sample t Test,” The American Statistician, 59, 252–257.
Web of Science ®Google Scholar
Goodman, S. N. (1999a), “Toward Evidence-based Medical Statistics. 1: The p Value Fallacy,” Annals of Internal Medicine, 130, 995–1004.
PubMed Web of Science ®Google Scholar
Goodman, S. N. (1999b), “Toward Evidence-based Medical Statistics. 2: The Bayes Factor,” Annals of Internal Medicine, 130, 1005–1013.
PubMed Web of Science ®Google Scholar
Gorroochurn, P. (2016), Classic Topics on the History of Modern Mathematical Statistics: From Laplace to More Recent Times. New York: Wiley.
Google Scholar
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., and Altman, D. G. (2016), “Statistical Tests, p Values, Confidence Intervals, and Power: A Guide to Misinterpretations,” European Journal of Epidemiology, 31, 337–350.
PubMed Web of Science ®Google Scholar
Halsey, L. G., Curran-Everett, D., Vowler, S. L., and Drummond, G. B. (2015), “The Fickle p Value Generates Irreproducible Results.“ Nature Methods, 12, 179–185.
PubMed Web of Science ®Google Scholar
Ioannidis, J. P. (2008), “Why Most Discovered True Associations are Inflated,” Epidemiology, 19, 640–648.
PubMed Web of Science ®Google Scholar
Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B., and Speed, T. P. (2003), “Summaries of Affymetrix Genechip Probe Level Data,” Nucleic Acids Research, 31, e15–e15.
PubMed Web of Science ®Google Scholar
Jaynes, E. T. (1968). “Prior Probabilities,” IEEE Transactions on Systems Science and Cybernetics, 4, 227–241.
Google Scholar
Jeffreys, H. (1998). The Theory of Probability. Oxford, UK: Oxford University Press.
Google Scholar
Kuriakose, M., Chen, W., He, Z., Sikora, A., Zhang, P., Zhang, Z., Qiu, W., Hsu, D., McMunn-Coffran, C., Brown, S. et al. (2007b, January), “Expression Data from Head and Neck squamous Cell Carcinoma,” available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE6631.
Google Scholar
Kuriakose, M., Chen, W., He, Z., Sikora, A., Zhang, P., Zhang, Z., Qiu, W., Hsu, D., McMunn-Coffran, C., Brown, S. et al. (2004a), “Selection and Validation of Differentially Expressed Genes in Head and Neck Cancer,” Cellular and Molecular Life Sciences, 61, 1372–1383.
PubMed Web of Science ®Google Scholar
Mazliak, L. (2015). “Poincarés Odds,” in Henri Poincaré, 1912–2012. eds. B. Duplantier and V. Rivasseau, Berlin: Springer, pp. 151–192.
Google Scholar
Nuzzo, R. (2014), “Statistical Errors,” Nature, 506, 150–152.
PubMed Web of Science ®Google Scholar
Pereira, T. V., Horwitz, R. I., and Ioannidis, J. P. (2012), “Empirical Evaluation of Very Large Treatment Effects of Medical Interventions,” Jama, 308, 1676–1684.
PubMed Web of Science ®Google Scholar
Rothman, K. J. (2016), “Disengaging from Statistical Significance,” European Journal of Epidemiology, 31, 443–444.
PubMed Web of Science ®Google Scholar
Schulze, A., and Downward, J. (2001), “Navigating Gene Expression Using Microarrays: A Technology review,” Nature Cell Biology, 3, E190–E195.
PubMed Web of Science ®Google Scholar
Walck, C. (2007), Handbook on Statistical Distributions for Experimentalists, Stockholm: University of Stockholm.
Google Scholar
Wang, M., and Liu, G. (2016), “A Simple Two-sample Bayesian t-test for Hypothesis Testing,” The American Statistician, 70, 195–201.
Web of Science ®Google Scholar
Wasserstein, R. L., and Lazar, N. A. (2016), “The asa’s Statement on p-values: Context, Process, and Purpose,” American Statistical, 70, 129–133.
Web of Science ®Google Scholar

A. Projector Matrices

The N × N matrix projectors P_C, P_B and P_W, which respectively project the observation N-vector o on the overall center-of-mass, the between-class variance hyperplane and the within-class variance hyperplane, are given by the matrices

(15)

\begin{matrix} P_{C} & = & \frac{1}{N} {(1 1^{t})}_{N \times N}, \\ P_{B} & = & (\begin{matrix} \frac{1}{n_{1}} {(1 1^{t})}_{n_{1} \times n_{1}} & 0 \\ \frac{1}{n_{2}} {(1 1^{t})}_{n_{2} \times n_{2}} \\ ⋱ \\ 0 & \frac{1}{n_{ν_{1}}} {(1 1^{t})}_{n_{ν_{1} + 1} \times n_{ν_{1} + 1}} \end{matrix}) - P_{C}, \\ P_{W} & = & (\begin{matrix} I_{n_{1}} - \frac{1}{n_{1}} {(1 1^{t})}_{n_{1} \times n_{1}} & 0 \\ I_{n_{2}} - \frac{1}{n_{2}} {(1 1^{t})}_{n_{2} \times n_{2}} \\ ⋱ \\ 0 & I_{n_{ν_{1} + 1}} - \frac{1}{n_{ν_{1} + 1}} {(1 1^{t})}_{n_{ν_{1} + 1} \times n_{ν_{1} + 1}} \end{matrix}) \end{matrix}

(15) where

[n_{1}, n_{2}, \dots, n_{ν_{1} + 1}]

partitions the total of N = ∑^{ν₁ + 1}_{i = 1}n_i observations composing the observation N-vector o into ν₁ + 1 respective class cardinalities. The projectors obey the identity resolution

(16)

I_{N} = P_{C} + P_{B} + P_{W},

(16) with I_N the identity matrix. Recall that a matrix projector obeys the defining property P² = P, with eigenvalue equation λ(λ − 1) = 0: its eigenvalues are thus restricted to the values {0, 1}, and its rank is provided by the cardinality of the set of its non-vanishing unit eigenvalues. The projectors P_C, P_B and P_W have ranks 1, ν₁ and ν₂ = N − ν₁ − 1, respectively. The two-sample case corresponds to ν₁ = 1.

B. Determination of the Sampling Distribution

Using the matrix resolution of the identity (Equation16(16) $I_{N} = P_{C} + P_{B} + P_{W},$ (16) ), consider the situation when the within-class uniform centered vector (17) $\begin{matrix} (I_{N} - P_{C}) (\begin{matrix} δ_{1} 1_{n_{1}} \\ δ_{2} 1_{n_{2}} \end{matrix}) & = & (P_{B} + P_{W}) (\begin{matrix} δ_{1} 1_{n_{1}} \\ δ_{2} 1_{n_{2}} \end{matrix}) = P_{B} (\begin{matrix} δ_{1} 1_{n_{1}} \\ δ_{2} 1_{n_{2}} \end{matrix}) \\ = & T_{B} T_{B}^{t} (\begin{matrix} δ_{1} 1_{n_{1}} \\ δ_{2} 1_{n_{2}} \end{matrix}) = \frac{δ_{1} - δ_{2}}{\sqrt{\frac{1}{n_{1}} + \frac{1}{n_{2}}}} T_{B} \end{matrix}$ (17) along the unit polar axis T_B is added to an observation vector drawn from a population with null noncentrality parameter. In the two group comparison context, this will impart the nonvanishing normalized effect size δ₁ to the first subpopulation, and the nonvanishing normalized effect size δ₂ to the second subpopulation. Under these circumstances, the two-sample t-statistic (Equation1(1) $\begin{matrix} t & = & {[ν \frac{o^{t} P_{B} o}{o^{t} P_{W} o}]}^{1 / 2} = \sqrt{ν} \frac{cos θ}{sin θ}, \\ - \infty \leq t \leq \infty, 0 \leq θ \leq π, ν = N - 2 . \end{matrix}$ (1) ) will distribute according to the noncentral h-distribution ρ_{(ν, Δ)}(θ), with noncentrality parameter Δ given by the factor multiplying T_B above, that is, (18) $\begin{matrix} Δ = T_{B}^{t} (\begin{matrix} δ_{1} 1_{n_{1}} \\ δ_{2} 1_{n_{2}} \end{matrix}) & = & \frac{δ}{\sqrt{\frac{1}{n_{1}} + \frac{1}{n_{2}}}}, δ = δ_{1} - δ_{2}, \\ = & \sqrt{\frac{n}{2}} δ if n_{1} = n_{2} = n . \end{matrix}$ (18) The quantity (19) $\begin{matrix} cot θ_{(ν, Δ)} = \frac{Δ}{\sqrt{ν}} & = & \frac{δ}{\sqrt{ν (\frac{1}{n_{1}} + \frac{1}{n_{2}})}} \\ = & \sqrt{\frac{n}{n - 1}} \frac{δ}{2} if n_{1} = n_{2} = n, ν = 2 n - 2, \\ ≃ & \frac{δ}{2} for large n_{1} = n_{2} = n \end{matrix}$ (19) is needed to determine the noncentrality parameter (20) $\begin{matrix} cos θ_{(ν, Δ)} & = & \frac{Δ / \sqrt{ν}}{\sqrt{{(Δ / \sqrt{ν})}^{2} + 1}} = \frac{Δ}{\sqrt{Δ^{2} + ν}} = \frac{δ}{\sqrt{δ^{2} + ν (\frac{1}{n_{1}} + \frac{1}{n_{2}})}} \\ = & \frac{δ}{\sqrt{δ^{2} + 4 \frac{n - 1}{n}}} if n_{1} = n_{2} = n, ν = 2 n - 2, \\ ≃ & \frac{δ}{\sqrt{δ^{2} + 4}} = cos θ_{(ν, \sqrt{ν} δ / 2)} for large n_{1} = n_{2} = n \end{matrix}$ (20) as reexpressed in trigonometric terms. We have carried out the latter tedious enumeration of square root factors because they are often introduced summarily in the literature with no reference to their simple geometric origin. See, e.g., Ellis (Citation2010).

C. Noncentral Hypersphere F-distribution

The F-statistic is defined in both projection and geometrical terms via the variance ratio (21) $F_{(ν_{1}, ν_{2})} (θ) = \frac{ν_{2}}{ν_{1}} \frac{o^{t} P_{B} o}{o^{t} P_{W} o} = \frac{ν_{2}}{ν_{1}} \frac{{cos}^{2} θ}{{sin}^{2} θ},$ (21) where ν₁ and ν₂ are the dimensions of the between-class and within-class variance hyperplanes, respectively. The matrix projector P_B and P_W are defined in Appendix A. Geometrically, θ is the angle between the observation vector o and the between-class variance hyperplane, and π/2 − θ the angle between o and the orthogonal pooled within-class variance hyperplane. When the observation o has uniform class-independent isotropic distribution on the central unit radius hypersphere $S^{ν_{1} + ν_{2}},$ F distributes according to the central (Λ = 0) Fisher–Snedecor F-distribution (22) $ρ_{(ν_{1}, ν_{2}, Λ = 0)} (θ) = 2 \frac{Γ (\frac{ν_{1} + ν_{2}}{2})}{Γ (\frac{ν_{1}}{2}) Γ (\frac{ν_{2}}{2})} {cos}^{ν_{1} - 1} θ {sin}^{ν_{2} - 1} θ, 0 \leq θ \leq \frac{π}{2} .$ (22) The NHST p-value is obtained by computing the probability of having an observation with angular distance θ′ ⩽ θ to the between-class variance hyperplane: it is given by (23) $p (θ) = 2 \frac{Γ (\frac{ν_{1} + ν_{2}}{2})}{Γ (\frac{ν_{1}}{2}) Γ (\frac{ν_{2}}{2})} \int_{0}^{θ} {cos}^{ν_{1} - 1} θ^{'} {sin}^{ν_{2} - 1} θ^{'} d θ^{'}, 0 \leq θ \leq \frac{π}{2},$ (23) the integration of which can be carried out analytically: see Appendix D. We shall find convenient in the following to effect the change of variable R = cos θ in order to reexpress the central Fisher–Snedecor F-distribution (Equation22(22) $ρ_{(ν_{1}, ν_{2}, Λ = 0)} (θ) = 2 \frac{Γ (\frac{ν_{1} + ν_{2}}{2})}{Γ (\frac{ν_{1}}{2}) Γ (\frac{ν_{2}}{2})} {cos}^{ν_{1} - 1} θ {sin}^{ν_{2} - 1} θ, 0 \leq θ \leq \frac{π}{2} .$ (22) ) as (24) $ρ_{(ν_{1}, ν_{2}, Λ = 0)} (R) = 2 \frac{Γ (\frac{ν_{1} + ν_{2}}{2})}{Γ (\frac{ν_{1}}{2}) Γ (\frac{ν_{2}}{2})} R^{ν_{1} - 1} {(1 - R^{2})}^{\frac{ν_{2} - 2}{2}}, 0 \leq R \leq 1,$ (24) with R a correlation-like parameter. Consider now the projection of the observation vector o on the between-class variance hyperplane. Since P_B is a projector operator of rank ν₁, it is invariant under rotation within the corresponding unit-eigenvalue eigenspace. This eigenspace degeneracy will allow us to perform the noncentral hypersphere cosine translation (25) $cos θ_{(ν_{2}, Λ)} = \sqrt{Λ / (Λ + ν_{2})}$ (25) expressed in terms of the noncentrality parameter Λ for the noncentral F-distribution along a unit vector $T_{Λ}$ of our choosing in this eigenspace. In the coordinate system centered on the hypersphere, we thus define the polar coordinates (26) $\begin{matrix} o^{t} (T_{Λ} T_{Λ}^{t}) o & = & R^{2} {cos}^{2} φ, \\ o^{t} (P_{B} - T_{Λ} T_{Λ}^{t}) o & = & R^{2} {sin}^{2} φ . \end{matrix}$ (26) The 3-vector $(R cos φ, R sin φ, \sqrt{1 - R^{2}})$ is easily argued to distribute according to the joint distribution (27) $\begin{matrix} ρ_{(ν_{1}, ν_{2}, 0)} (R, φ) & = & ρ_{(ν_{1}, ν_{2}, 0)} (R) \times ρ_{(ν_{1} - 1, 0)} (φ), 0 \leq R \leq 1, 0 \leq φ \leq π, ν_{1} \geq 2, \\ = & 2 \frac{Γ (\frac{ν_{1} + ν_{2}}{2})}{Γ (\frac{ν_{1}}{2}) Γ (\frac{ν_{2}}{2})} R^{ν_{1} - 1} {(1 - R^{2})}^{\frac{ν_{2} - 2}{2}} \times \frac{Γ (\frac{ν_{1}}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν_{1} - 1}{2})} {sin}^{ν_{1} - 2} φ, \end{matrix}$ (27) where $ρ_{(ν_{1} - 1, 0)} (φ)$ is the Fisher–Student’s central $t -$ distribution (Equation3(3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) ). When the noncentrality parameter Λ is nonvanishing, the observation vector as assessed in the observation coordinate system shall similarly read $(r cos ψ, r sin ψ, \sqrt{1 - r^{2}}),$ with its polar coordinates r and ψ both function of the noncentral parameter (Equation25(25) $cos θ_{(ν_{2}, Λ)} = \sqrt{Λ / (Λ + ν_{2})}$ (25) ). Indeed, when the latter 3-vector is recentered on the translated hypersphere and renormalized, it reads (28) $(r cos ψ, r sin ψ, \sqrt{1 - r^{2}}) \Rightarrow \frac{(r cos ψ - cos θ_{(ν_{2}, Λ)}, r sin ψ, \sqrt{1 - r^{2}})}{{[1 - 2 r cos ψ cos θ_{(ν_{2}, Λ)} + {cos}^{2} θ_{(ν_{2}, Λ)}]}^{1 / 2}},$ (28) which leads us to consider the polar coordinate transformation (29) $R = {[\frac{r^{2} - 2 r cos ψ cos θ_{(ν_{2}, Λ)} + {cos}^{2} θ_{(ν_{2}, Λ)}}{1 - 2 r cos ψ cos θ_{(ν_{2}, Λ)} + {cos}^{2} θ_{(ν_{2}, Λ)}}]}^{1 / 2} and φ = arcot (\frac{r cos ψ - cos θ_{(ν_{2}, Λ)}}{r sin ψ}) .$ (29) The transformation Jacobian is given by (30) $\begin{matrix} J_{(ν_{1}, ν_{2}, Λ)} (r, ψ) & = & \frac{\partial (R, φ)}{\partial (r, ψ)} \\ = & \frac{r (1 - r cos ψ cos θ_{(ν_{2}, Λ)})}{{[r^{2} - 2 r cos ψ cos θ_{(ν_{2}, Λ)} + {cos}^{2} θ_{(ν_{2}, Λ)}]}^{1 / 2} {[1 - 2 r cos ψ cos θ_{(ν_{2}, Λ)} + {cos}^{2} θ_{(ν_{2}, Λ)}]}^{3 / 2}}, \end{matrix}$ (30) where the denominator term $[r^{2} - 2 r cos ψ cos θ_{(ν_{2}, Λ)} + {cos}^{2} θ_{(ν_{2}, Λ)}]$ is noted to vanish at the center of the translated hypersphere, that is, at $(r, ψ) = (cos θ_{(ν_{2}, Λ)}, 0) .$ Introducing the transformation (Equation29(29) $R = {[\frac{r^{2} - 2 r cos ψ cos θ_{(ν_{2}, Λ)} + {cos}^{2} θ_{(ν_{2}, Λ)}}{1 - 2 r cos ψ cos θ_{(ν_{2}, Λ)} + {cos}^{2} θ_{(ν_{2}, Λ)}}]}^{1 / 2} and φ = arcot (\frac{r cos ψ - cos θ_{(ν_{2}, Λ)}}{r sin ψ}) .$ (29) ) into the joint distribution (Equation27(27) $\begin{matrix} ρ_{(ν_{1}, ν_{2}, 0)} (R, φ) & = & ρ_{(ν_{1}, ν_{2}, 0)} (R) \times ρ_{(ν_{1} - 1, 0)} (φ), 0 \leq R \leq 1, 0 \leq φ \leq π, ν_{1} \geq 2, \\ = & 2 \frac{Γ (\frac{ν_{1} + ν_{2}}{2})}{Γ (\frac{ν_{1}}{2}) Γ (\frac{ν_{2}}{2})} R^{ν_{1} - 1} {(1 - R^{2})}^{\frac{ν_{2} - 2}{2}} \times \frac{Γ (\frac{ν_{1}}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν_{1} - 1}{2})} {sin}^{ν_{1} - 2} φ, \end{matrix}$ (27) ) and regrouping the various terms, we find that (31) $ρ_{(ν_{1}, ν_{2}, Λ)}^{h} (r, ψ) = L_{(ν_{1}, ν_{2}, Λ)}^{h} (r, ψ) ρ_{(ν_{1}, ν_{2}, 0)} (r), ν_{1} \geq 2,$ (31) where $ρ_{(ν_{1}, ν_{2}, 0)} (r)$ is the central Fisher–Snedecor F-distribution (Equation24(24) $ρ_{(ν_{1}, ν_{2}, Λ = 0)} (R) = 2 \frac{Γ (\frac{ν_{1} + ν_{2}}{2})}{Γ (\frac{ν_{1}}{2}) Γ (\frac{ν_{2}}{2})} R^{ν_{1} - 1} {(1 - R^{2})}^{\frac{ν_{2} - 2}{2}}, 0 \leq R \leq 1,$ (24) ), and where the likelihood function $L_{(ν_{1}, ν_{2}, Λ)}^{h}$ is given by (32) $L_{(ν_{1}, ν_{2}, Λ)}^{h} (r, ψ) = \frac{Γ (\frac{ν_{1}}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν_{1} - 1}{2})} \frac{(1 - r cos ψ cos θ_{(ν_{2}, Λ)}) {sin}^{ν_{1} - 2} ψ}{{[1 - 2 r cos ψ cos θ_{(ν_{2}, Λ)} + {cos}^{2} θ_{(ν_{2}, Λ)}]}^{\frac{ν_{1} + ν_{2}}{2}}},$ (32) with neat cancelation of all singular terms. When Λ = 0, we have that $cos θ_{(ν_{2}, Λ)} = 0,$ ψ = ϕ, r = R, and $ρ_{(ν_{1}, ν_{2}, Λ)}^{h} (r, ψ)$ simplifies to the central Fisher–Snedecor $F -$ distribution $ρ_{(ν_{1}, ν_{2}, 0)} (R, φ)$ defined by equation (Equation27(27) $\begin{matrix} ρ_{(ν_{1}, ν_{2}, 0)} (R, φ) & = & ρ_{(ν_{1}, ν_{2}, 0)} (R) \times ρ_{(ν_{1} - 1, 0)} (φ), 0 \leq R \leq 1, 0 \leq φ \leq π, ν_{1} \geq 2, \\ = & 2 \frac{Γ (\frac{ν_{1} + ν_{2}}{2})}{Γ (\frac{ν_{1}}{2}) Γ (\frac{ν_{2}}{2})} R^{ν_{1} - 1} {(1 - R^{2})}^{\frac{ν_{2} - 2}{2}} \times \frac{Γ (\frac{ν_{1}}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν_{1} - 1}{2})} {sin}^{ν_{1} - 2} φ, \end{matrix}$ (27) ). The density $ρ_{(ν_{1}, ν_{2}, Λ)}^{h} (r, ψ)$ can be drawn on a two-dimensional half-circle as in , with the polar coordinates (r, ψ) summarizing the between-class variance hyperplane coordinates. We are ultimately interested in the distribution of $F_{(ν_{1}, ν_{2}, Λ)} (r)$ parameterized by the correlation-like parameter r only. The desired distribution function is obtained by computing the marginal distribution of (Equation31(31) $ρ_{(ν_{1}, ν_{2}, Λ)}^{h} (r, ψ) = L_{(ν_{1}, ν_{2}, Λ)}^{h} (r, ψ) ρ_{(ν_{1}, ν_{2}, 0)} (r), ν_{1} \geq 2,$ (31) ), that is, by integrating it with respect to polar coordinate angle ψ, as is graphically illustrated in . We thus have that $F_{(ν_{1}, ν_{2}, Λ)} (r)$ distributes according to (33) $ρ_{(ν_{1}, ν_{2}, Λ)}^{h} (r) = L_{(ν_{1}, ν_{2}, Λ)}^{h} (r) ρ_{(ν_{1}, ν_{2}, 0)} (r), ν_{1} \geq 2,$ (33) with (34) $L_{(ν_{1}, ν_{2}, Λ)}^{h} (r) = \int_{ψ = 0}^{π} L_{(ν_{1}, ν_{2}, Λ)}^{h} (r, ψ) d ψ .$ (34)

Figure 9. Left upper panel: central Fisher–Snedecor F-distribution $ρ_{(ν_{1} = 3, ν_{2} = 20, Λ = 0)} (r, ψ)$ plotted on the between-class variance hyperplane summarized by the polar coordinates (r, ψ). Right upper panel: noncentral h-distribution $ρ_{(ν_{1} = 3, ν_{2} = 20, Λ = 1)}^{h} (r, ψ)$ similarly plotted on the between-class variance hyperplane. Left lower panel: central Fisher–Snedecor F-distribution $ρ_{(ν_{1} = 3, ν_{2} = 20, Λ = 0)} (r)$ as plotted along the correlation-like r axis: it is the marginal distribution of the distribution above it, graphically obtained by circularly sweeping the distribution radar-like from ψ = −π to ψ = 0 and projecting the sweep result on the positive axis of the lower panel. Right lower panel: noncentral h-distribution $ρ_{(ν_{1} = 3, ν_{2} = 20, Λ = 1)}^{h} (r)$ as plotted along the correlation-like r axis: again, it is the marginal distribution of the distribution above it.

The special case ν₁ = 1 is given by (35) $ρ_{(ν_{1} = 1, ν_{2}, Λ)}^{h} (r) = \sum_{r^{'} = {r, - r}} ρ_{(ν_{2}, Λ)}^{h} (r^{'}),$ (35) where $ρ_{(ν_{2}, Λ)}^{h} (r)$ is the noncentral h-distribution (Equation7(7) $ρ_{(ν, δ)}^{h} (θ) = L_{(ν, δ)}^{h} (θ) ρ_{(ν, 0)} (θ),$ (7) ) reexpressed in terms of the correlation-like r parameter: (36) $ρ_{(ν_{2}, Λ)}^{h} (r) = \frac{Γ (\frac{ν_{2} + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν_{2}}{2})} \frac{(1 - r cos θ_{(ν_{2}, Λ)}) {(1 - r^{2})}^{\frac{ν_{2} - 2}{2}}}{{[1 - 2 r cos θ_{(ν_{2}, Λ)} + {cos}^{2} θ_{(ν_{2}, Λ)}]}^{\frac{ν_{2} + 1}{2}}} .$ (36) For the sake of comparison, we state without proof that the noncentral F-distribution can be expressed as (37) $ρ_{(ν_{1}, ν_{2}, Λ)}^{F} (r) = L_{(ν_{1}, ν_{2}, Λ)}^{F} (r) ρ_{(ν_{1}, ν_{2}, 0)} (r),$ (37) with the likelihood function $L_{(ν_{1}, ν_{2}, Λ)}^{F} (r)$ defined by the cumbersome infinite sum of terms (Walck (Citation2007)) $L_{(ν_{1}, ν_{2}, Λ)}^{F} (r) = e^{- \frac{Λ}{2}} \sum_{j = 0}^{\infty} \frac{1}{j!} {(\frac{Λ}{2})}^{j} \frac{Γ (\frac{ν_{1} + ν_{2} + 2 j}{2})}{Γ (\frac{ν_{1} + ν_{2}}{2})} \frac{Γ (\frac{ν_{1}}{2})}{Γ (\frac{ν_{1} + 2 j}{2})} r^{2 j} .$ Equations (Equation33(33) $ρ_{(ν_{1}, ν_{2}, Λ)}^{h} (r) = L_{(ν_{1}, ν_{2}, Λ)}^{h} (r) ρ_{(ν_{1}, ν_{2}, 0)} (r), ν_{1} \geq 2,$ (33) ) and (Equation37(37) $ρ_{(ν_{1}, ν_{2}, Λ)}^{F} (r) = L_{(ν_{1}, ν_{2}, Λ)}^{F} (r) ρ_{(ν_{1}, ν_{2}, 0)} (r),$ (37) ) are graphically compared in . The latter distribution is known to generate numerical instabilities. Indeed, Baharev, Schichl, and Rév (Citation2017) state that “computations involving the noncentral F-distribution are notoriously difficult to implement properly in floating-point arithmetic: catastrophic loss of precision, floating-point underflow and overflow, drastically increasing computation time and program hang-ups, and instability due to numerical cancellation have all been reported.” The simpler analytical expression (Equation33(33) $ρ_{(ν_{1}, ν_{2}, Λ)}^{h} (r) = L_{(ν_{1}, ν_{2}, Λ)}^{h} (r) ρ_{(ν_{1}, ν_{2}, 0)} (r), ν_{1} \geq 2,$ (33) ) for the noncentral hypersphere distribution should help avoiding such numerical instabilities, while allowing easier exploitation of the Bayesian hypothesis testing framework as was carried out in Section 4. Finally, note that definitions of the noncentrality parameter Λ vary in the literature: numerical computations as carried in indicates that equation (Equation25(25) $cos θ_{(ν_{2}, Λ)} = \sqrt{Λ / (Λ + ν_{2})}$ (25) ) refers to the parameter λ (herein Λ) used by Walck (Citation2007).

Figure 10. Left panels: probability density curves for the noncentral hypersphere distribution $ρ_{(ν_{1}, ν_{2}, Λ)}^{h} (r),$ for the parameters (Λ, ν₁) stated in the legend and ν₂ which can be deduced from the grayscale colorbar. Right panels: probability density curves for the noncentral F-distribution $ρ_{(ν_{1}, ν_{2}, Λ)}^{F} (r)$ for the same parameter set. The noncentral hypersphere distribution $ρ_{(ν_{1}, ν_{2}, Λ)}^{h} (r)$ offers a very economical analytic alternative to the noncentral F-distribution $ρ_{(ν_{1}, ν_{2}, Λ)}^{F} (r) .$

Figure 10. Left panels: probability density curves for the noncentral hypersphere distribution ρ(ν1,ν2,Λ)h(r), for the parameters (Λ, ν1) stated in the legend and ν2 which can be deduced from the grayscale colorbar. Right panels: probability density curves for the noncentral F-distribution ρ(ν1,ν2,Λ)F(r) for the same parameter set. The noncentral hypersphere distribution ρ(ν1,ν2,Λ)h(r) offers a very economical analytic alternative to the noncentral F-distribution ρ(ν1,ν2,Λ)F(r).

D. Analytic Expressions for the Cumulative Distribution Functions (p-values)

Cumulative distribution functions, thus p-values, (38) $p (θ) = \frac{Γ (\frac{ν_{2} + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν_{2}}{2})} \int_{0}^{θ} {sin}^{ν_{2} - 1} θ d θ, 0 \leq θ \leq π,$ (38) for the central Fisher–Student t-distribution (Equation3(3) $ρ_{(ν, 0)} (θ) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{1}{2}) Γ (\frac{ν}{2})} {sin}^{ν - 1} θ, 0 \leq θ \leq π,$ (3) ) can be computed analytically in trigonometric terms. One finds for ν₂ odd (39) $p (θ) = \frac{θ}{π} - \frac{sin θ cos θ}{π} (1 + \frac{2}{3} {sin}^{2} θ + \dots + \frac{2 \cdot 4 \cdot \dots \cdot (ν_{2} - 3)}{3 \cdot 5 \cdot \dots \cdot (ν_{2} - 2)} {sin}^{ν_{2} - 3} θ_{y}),$ (39) where only the first term should be retained for ν₂ = 1, the first two terms for ν₂ = 3, etc.; and for ν₂ even (40) $p (θ) = \frac{1}{2} - \frac{cos θ}{2} (1 + \frac{1}{2} {sin}^{2} θ + \dots + \frac{1 \cdot 3 \cdot \dots \cdot (ν_{2} - 3)}{2 \cdot 4 \cdot \dots \cdot (ν_{2} - 2)} {sin}^{ν_{2} - 2} θ),$ (40) where only the first two terms should be retained for ν₂ = 2, the first three terms for ν₂ = 4, etc. As expected, p(θ) = 0, 1/2 and 1 for θ = 0, π/2 and π, respectively. See also Chance (Citation1986). Similarly, cumulative distribution functions, thus p-values, (41) $p_{(ν_{1}, ν_{2})} (θ) = 2 \frac{Γ (\frac{ν_{1} + ν_{2}}{2})}{Γ (\frac{ν_{1}}{2}) Γ (\frac{ν_{2}}{2})} \int_{0}^{θ} {cos}^{ν_{1} - 1} θ^{'} {sin}^{ν_{2} - 1} θ^{'} d θ^{'}, 0 \leq θ \leq \frac{π}{2},$ (41) for the central Fisher–Snedecor F-distribution (Equation22(22) $ρ_{(ν_{1}, ν_{2}, Λ = 0)} (θ) = 2 \frac{Γ (\frac{ν_{1} + ν_{2}}{2})}{Γ (\frac{ν_{1}}{2}) Γ (\frac{ν_{2}}{2})} {cos}^{ν_{1} - 1} θ {sin}^{ν_{2} - 1} θ, 0 \leq θ \leq \frac{π}{2} .$ (22) ) can be computed analytically in trigonometric terms. We find for ν₁ even $\begin{matrix} p_{(ν_{1}, ν_{2})} (θ) & = & {sin}^{ν_{2}} θ (1 + \frac{ν_{2}}{2} {cos}^{2} θ + \dots + \frac{ν_{2} (ν_{2} + 2) \dots (ν_{1} + ν_{2} - 4)}{2 \cdot 4 \dots (ν_{1} - 2)} {cos}^{ν_{1} - 2} θ), \end{matrix}$ for ν₂ even $\begin{matrix} p_{(ν_{1}, ν_{2})} (θ) & = & 1 - {cos}^{ν_{1}} θ (1 + \frac{ν_{1}}{2} {sin}^{2} θ + \dots + \frac{ν_{1} (ν_{1} + 2) \dots (ν_{1} + ν_{2} - 4)}{2 \cdot 4 \dots (ν_{2} - 2)} {sin}^{ν_{2} - 2} θ), \end{matrix}$ while, for ν₁ and ν₂ simultaneously odd, $p_{(ν_{1}, ν_{2})} (θ) = \frac{2}{π} (θ + A_{(ν_{1}, ν_{2})} (θ) - B_{ν_{2}} (θ))$ where $\begin{matrix} A_{ν_{1} = 1, ν_{2}} (θ) = 0, \\ A_{ν_{1} > 1, ν_{2} = 1} (θ) = sin θ cos θ (1 + \frac{2}{3} {cos}^{2} θ + \dots + \frac{2 \cdot 4 \cdot \dots \cdot (ν_{1} - 3)}{3 \cdot 5 \cdot \dots \cdot (ν_{1} - 2)} {cos}^{ν_{1} - 3} θ), \\ A_{ν_{1} > 1, ν_{2} > 1} (θ) = [\frac{(ν_{2} - 1) (ν_{2} - 3) \dots 2}{(ν_{2} - 2) (ν_{2} - 4) \dots 1}] {sin}^{ν_{2}} θ cos θ \\ \times (1 + \frac{(ν_{2} + 1)}{3} {cos}^{2} θ + \dots + \frac{(ν_{2} + 1) (ν_{2} + 3) \dots (ν_{1} + ν_{2} - 4)}{3 \cdot 5 \cdot \dots \cdot (ν_{1} - 2)} {cos}^{ν_{1} - 3} θ), \end{matrix}$ and $\begin{matrix} B_{ν_{2} = 1} (θ) = 0, \\ B_{ν_{2} > 1} (θ) = sin θ cos θ (1 + \frac{2}{3} {sin}^{2} θ + \dots + \frac{2 \cdot 4 \cdot \dots \cdot (ν_{2} - 3)}{3 \cdot 5 \cdot \dots \cdot (ν_{2} - 2)} {sin}^{ν_{2} - 3} θ) . \end{matrix}$ See also Walck (Citation2007). All of the above formulae have been extensively verified to reproduce numerical outputs from softwares with statistical subroutines such as R or MATLAB.

E. BF_ν(p) as normalized p-value distribution

When the Bayes factor BF_ν(θ) is reparameterized in terms of p(θ) as computed in Appendix D, we have that $\begin{matrix} \int_{p = 0}^{1} B F_{ν} (p) d p & = & \int_{θ = 0}^{π} B F_{ν} (θ) \frac{d p (θ)}{d θ} d θ = \int_{θ = 0}^{π} ρ_{(ν, 0)} (θ) B F_{ν} (θ) d θ \\ = & \int_{θ = 0}^{π} ρ_{(ν, 0)} (θ) [\int_{θ_{1} = 0}^{π} L_{(ν, \sqrt{ν} cot θ_{1})}^{h} (θ) P r_{ν} (θ_{1}) d θ_{1}] d θ \\ = & \int_{θ_{1} = 0}^{π} [\int_{θ = 0}^{π} ρ_{(ν, 0)} (θ) L_{(ν, \sqrt{ν} cot θ_{1})}^{h} (θ) d θ] P r_{ν} (θ_{1}) d θ_{1} \\ = & \int_{θ_{1} = 0}^{π} [\int_{θ = 0}^{π} ρ_{(ν, \sqrt{ν} cot θ_{1})}^{h} (θ) d θ] P r_{ν} (θ_{1}) d θ_{1} = \int_{θ_{1} = 0}^{π} P r_{ν} (θ_{1}) d θ_{1}, \end{matrix}$ which demonstrates that BF_ν(p) is a normalized p-value density as long as the prior Pr_ν(θ₁) is itself a normalized probability density.

Bayesian Analysis on a Noncentral Fisher–Student’s Hypersphere

ABSTRACT

1. Introduction

2. Noncentral Hypersphere h-distribution

3. Null Hypothesis Significance Testing Framework

4. Bayesian Hypothesis Testing Framework

5. Applications

6. Conclusion

References

A. Projector Matrices

B. Determination of the Sampling Distribution

C. Noncentral Hypersphere F-distribution

D. Analytic Expressions for the Cumulative Distribution Functions (p-values)

E. BF_ν(p) as normalized p-value distribution

Information for

Open access

Opportunities

Help and information

Bayesian Analysis on a Noncentral Fisher–Student’s Hypersphere

ABSTRACT

1. Introduction

2. Noncentral Hypersphere h-distribution

3. Null Hypothesis Significance Testing Framework

4. Bayesian Hypothesis Testing Framework

5. Applications

6. Conclusion

References

A. Projector Matrices

B. Determination of the Sampling Distribution

C. Noncentral Hypersphere F-distribution

D. Analytic Expressions for the Cumulative Distribution Functions (p-values)

E. BFν(p) as normalized p-value distribution

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

E. BF_ν(p) as normalized p-value distribution