Full article: Computer algebra and algorithms for unbiased moment estimation of arbitrary order

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

While unbiased central moment estimators of lower orders (such as a sample variance) are easily obtainable and often used in practice, derivation of unbiased estimators of higher orders might be more challenging due to long math and tricky combinatorics. Moreover, higher orders necessitate calculation of estimators of powers and products that also amount to these orders. We develop a software algorithm that allows the user to obtain unbiased estimators of an arbitrary order and provide results up to the sixth order, including powers and products of lower orders. The method also extends to finding pooled estimates of higher central moments of several different populations (e.g. for two-sample tests). We introduce an R package Umoments that calculates one- and two-sample estimates and generates intermediate results used to obtain these estimators.

Keywords:

PUBLIC INTEREST STATEMENT

Higher-order statistics are increasingly used in various research fields and data analysis, and central moment estimates are useful for many approaches. Derivation of higher-order unbiased central moment estimators has long been a challenging task; software made the general order solution possible. This paper describes a direct approach to obtaining estimators of any order, including multi-sample pooled estimators. It also introduces an open source R package Umoments, which calculates one- and two-sample estimates up to sixth order and contains machinery to obtain even higher-order estimates, including a combinatorial algorithm that can be used for solving other problems and assist in long derivations (e.g. Edgeworth expansions).

1. Introduction

Most data analysis methods rely on estimating unknown quantities such as characteristics of an underlying distribution or an effect of a treatment. From a variety of possible estimators of an unknown true parameter, the ones that are typically chosen have certain desirable properties—e.g. consistency, efficiency, or unbiasedness. When the sample size is moderate or small, finite sample behavior of an estimator—such as bias, variability, and mean squared error—is particularly relevant and is therefore often given special consideration. In addition, when estimation is conducted across multiple samples or studies (pooled estimators), bias may become an important issue.

Moments of a distribution are the most basic building blocks of statistical analysis and their estimates are present in some form in virtually any practical application. A sample average is an estimate of the mean (first moment). Estimates of the variance (second central moment) are routinely used in statistical inference; they are present in all studentized statistics, of which the most common example is an ordinary $t$ -statistic. Higher moments and their estimates, while not as widely used, can be important in various statistical applications and inferential procedures; they also comprise cumulants and their scaled versions (skewness, kurtosis). They are often used in signal processing, financial modeling, and many other areas (for a list of applications, see Pebay, Terriberry, & Kolla et al., Citation2014). Methods that employ higher-order statistics might utilize data in a more efficient way and offer greater insight into the distribution of interest, thus providing additional refinement in inference, for example through the use of higher-order approximations to the distribution of a test statistic—such as empirical Edgeworth expansions (Bickel, Citation1974; Hall, Citation1987; Putter & van Zwet, Citation1998). These methods would require higher-order moment estimation and warrant considerations about estimators’ finite sample properties; since moderate or small sample analysis would benefit from such higher-order approaches, unbiased estimates could prove particularly useful.

Naïve estimators $m_{k} = n^{- 1} \sum_{i = 1}^{n} (X_{i} - \overset{ˉ}{X})^{k}$ , $k = 2, 3, \dots$ of central moments $μ_{k}$ are biased—that is, $E (m_{k}) \neq μ_{k}$ . The first unbiased estimator was introduced for variance by Friedrich Bessel; it is obtained by multiplying $m_{2}$ by a factor $n / (n - 1)$ , thus often called Bessel’s correction. That estimator is a part of an ordinary $t$ -statistic and therefore plays a role in Student’s $t$ -distribution; it corresponds to the degrees of freedom in chi-squared distribution that arises from $\sum_{i = 1}^{n} (X_{i} - \overset{ˉ}{X})^{2} \sim χ_{n - 1}^{2}$ when $X$ is a standard normal random variable. The corresponding standard deviation estimator, however, is still biased (though the bias is reduced) and underestimates the true parameter. Interest to unbiased moment and cumulant estimation has a long history, which led to theoretical advances and various strategies to be able to obtain higher-order estimators. The work of R.A. Fisher (1928) (Fisher, Citation1930) provided basis for much of this research, particularly on cumulants; for central moments, unbiased estimators up to fourth order (or “weight” in some literature) have been published by Harald Cramér in 1946 (Cramér, Citation2016); the results were later expanded for more complex settings (e.g. including weights (Rimoldini, Citation2014)).

Whereas derivation of unbiased moment estimators in general is straightforward, higher-order calculations involve long algebra and require obtaining nontrivial coefficients, and brute-force calculations of these coefficients become unfeasible fairly quickly. Having observations from different populations or categories, requiring pooled estimators, compounds the problem. Unlike second and third central moments, where naïve biased moment estimators differ from unbiased ones by a constant factor that does not depend on data, subsequent orders require calculation of combinations (integer powers and products) of lower moments that amount to the same order, which in turn creates systems of equations to be solved. With computer algebra, manipulating long algebraic expressions and solving reasonably large systems of linear equations is no longer an issue; the challenge can then be condensed to finding an expectation of the form $E ({\overline{X^{}}}^{j_{1}} {\overline{X^{2}}}^{j_{2}} {\overline{X^{3}}}^{j_{3}} \dots)$ , where $\overline{X^{j}} = n^{- 1} \sum_{i = 1}^{n} X_{i}^{j}$ , of an arbitrary order and length, written in terms of sample size and true central moments of the distribution of $X$ . Thus, a software algorithm that solves this problem and computer algebra can provide the machinery needed to obtain one-sample and multi-sample pooled estimates of any order, limited only by available processing power.

General order solutions for many problems formulated in the course of unbiased estimation history, including cumulant and moment estimation, are provided in mathStatica (Rose & Smith, Citation2002), an add-on package for the proprietory computer algebra system Mathematica. Still, given many potential uses for such estimates, there is a need for open-source software and easy-to-use tools, accessible to a wide range of researchers, that could be seamlessly incorporated into data analysis. Multi-sample pooled estimation, which has not received much attention in higher-order statistics pursuit (and is not included in mathStatica), can have many practical applications, especially in two-sample settings (e.g. comparing treatment and control groups). In addition, open access to the code and algorithms that are used in generating arbitrary order estimates can be used for obtaining other statistical results, e.g. Edgeworth expansions. We introduce an R package Umoments (Gerlovina & Hubbard, Citation2019), which provides pre-programmed functions that calculate one- and two-sample estimates up to sixth order, either from data or from naïve biased estimates, as well as algorithms and tools for generating general order estimators.

In this paper, we break down the procedure of obtaining unbiased moment estimators of an arbitrary order, as well as estimators of products and powers of moments (also referred to as generalized $h$ -statistics (Tracy & Gupta, Citation1974)) such as $μ_{i}^{k} μ_{j}^{l} \dots$ ; an analogous procedure is provided for multi-sample pooled estimators. Additionally, this direct approach is illustrated in the Sage and Jupyter https://github.com/innager/unbiasedMoments templates. Next, we describe the algorithm that generates an expression for expectation of raw (non-central) sample moments and their powers and products, thus automating the challenging part of the derivation. Results section provides a full set of one-sample unbiased estimators up to sixth order (or “weight” in some literature); two-sample pooled estimators can be found in Umoments package but orders four and higher are too long to include in the paper. Results are followed by a quick overview of Umoments package functions; we conclude with a discussion about practical applications of these estimators.

2. Procedure in general

2.1. One-sample estimates

For simplicity, we can consider a mean-zero random variable without any loss of generality. Let $X_{i}, \dots, X_{n}$ be a sample of independent identically distributed random variables with $E (X) = 0$ and central moments $μ_{k}$ (mean $μ_{1} = 0$ , variance $μ_{2}$ ), in this case equal to raw moments; $\overset{ˉ}{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}$ . We also adopt the following useful notation:

\overline{X^{j}} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}^{j}

$m_{k} = \frac{1}{n} \sum_{i = 1}^{n} {(X_{i} - \overset{ˉ}{X})}^{k}$ —naïve biased central moment estimators

$M (\cdot)$ —unbiased estimator of an expression inside the parentheses (for quantities such as central moments and their powers and products), e.g. $E [M (μ_{3}^{2})] = μ_{3}^{2}$ .

The steps to obtain unbiased estimators of a general order are straightforward:

for a desired order, list all the moment combinations for that order (example provided below);
expand their naïve biased estimators (remove the brackets);
take expectations and represent the results in terms of moments $μ_{k}$ and sample size $n$ ; this will produce an equation or a system of equations;
solve this equation or a system of equations for true moments.

As an illustration, we go through these steps for $M (μ_{3})$ , an unbiased estimator of a third central moment:

m_{3} = \frac{1}{n} \sum_{i = 1}^{n} {(X_{i} - \overset{ˉ}{X})}^{3} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}^{3} - \frac{3}{n} \sum_{i = 1}^{n} X_{i}^{2} \overset{ˉ}{X} + \frac{3}{n} \sum_{i = 1}^{n} X_{i} {\overset{ˉ}{X}}^{2} - \frac{1}{n} \sum_{i = 1}^{n} {\overset{ˉ}{X}}^{3},

(1)

E (m_{3}) = E (X^{3}) - 3 E (\overset{ˉ}{X} \overline{X^{2}}) + 3 E ({\overset{ˉ}{X}}^{3}) - E ({\overset{ˉ}{X}}^{3}) = μ_{3} - \frac{3}{n} μ_{3} + \frac{2}{n^{2}} μ_{3},

(1)

μ_{3} = \frac{n^{2}}{(n - 1) (n - 2)} E (m_{3}),

M (μ_{3}) = \frac{n^{2}}{(n - 1) (n - 2)} m_{3} .

Steps $2$ and $4$ are trivial and are performed using computer algebra. Calculation of any unbiased moment estimator of a given order involves all the combinations (powers and products of moments) of that order, which means that for fourth and higher orders there will be a system of equations rather than a single equation (recall that $μ_{1} = 0$ ). For example, estimators of seventh order will include $M (μ_{7})$ , $M (μ_{2} μ_{5})$ , $M (μ_{2}^{2} μ_{3})$ and $M (μ_{3} μ_{4})$ (Step $1$ ); Step $2$ will correspondingly expand $m_{7}$ , $m_{2} m_{5}$ , $m_{2}^{2} m_{3}$ and $m_{3} m_{4}$ producing four equations. Since the equations need to be solved for a given order’s combinations of true moments, not individual moments, and all the equations in the system are linear in that order, it makes sense to treat these combinations as single variables, thus solving a system of linear equations.

Step $3$ is more challenging but the problem can be reduced to finding an expression for $E ({\overline{X^{}}}^{j_{1}} {\overline{X^{2}}}^{j_{2}} {\overline{X^{3}}}^{j_{3}} \dots)$ since any term from the right-hand side of Step $2$ equations can be written in that form—e.g. $\frac{1}{n^{5}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \sum_{k = 1}^{n} \sum_{l = 1}^{n} \sum_{m = 1}^{n} X_{i} X_{j}^{4} X_{k}^{2} X_{l}^{2} X_{m} {\overset{ˉ}{X}}^{3} = {\overset{ˉ}{X}}^{5} {\overline{X^{2}}}^{2} \overline{X^{4}}$ . A general solution to this problem is provided in Umoments package (Gerlovina & Hubbard, Citation2019) that generates expressions for these expectations using combinatorics. This algorithm is explained in detail in Section 3.

2.2. Pooled estimates

A simple extension of the method can be used to obtain unbiased estimators of central moments for samples that contain observations from several populations or categories. We demonstrate the procedure on a two-category estimation, which extends trivially to any number of categories.

For a sample $X_{1}, \dots, X_{n_{x}}, Y_{1}, \dots, Y_{n_{y}}$ , $X ⊥ Y$ , let

m_{x k} = \frac{1}{n_{x}} \sum_{i = 1}^{n_{x}} (X_{i} - \overset{ˉ}{X})^{k},

m_{y k} = \frac{1}{n_{y}} \sum_{i = 1}^{n_{y}} {(Y_{i} - \overset{ˉ}{Y})}^{k}, a n d

(2)

m_{k} = \frac{\sum_{i = 1}^{n_{x}} {(X_{i} - \overset{ˉ}{X})}^{k} + \sum_{i = 1}^{n_{y}} {(Y_{i} - \overset{ˉ}{Y})}^{k}}{n_{x} + n_{y}} = \frac{n_{x} m_{x k} + n_{y} m_{y k}}{n_{x} + n_{y}},

(2)

where $m_{k}$ is a two-sample analog of the naïve biased estimator described previously. Note that pooled estimation implies an assumption of equality of estimated central moments between distributions of $X$ and $Y$ : $μ_{x k} = μ_{y k} = μ_{k}, k = 2, 3, \dots$ . Using this assumption, independence of $X$ and $Y$ , and one-sample results from Step $3$ in Section 2.1, we extend Step $3$ of the roadmap to incorporate two variables and obtain expectations.

Example: obtain two-sample pooled estimate of the third central moment. Using one-sample result (1), we get

\begin{aligned} E (m_{3}) = \frac{n_{x} E (m_{x 3}) + n_{y} E (m_{y 3})}{n_{x} + n_{y}} = \frac{n_{x}^{2} n_{y} + n_{x} n_{y}^{2} - 6 n_{x} n_{y} + 2 n_{x} + 2 n_{y}}{n_{x} n_{y} (n_{x} + n_{y})} μ_{3} \\ = [1 - \frac{2 (3 n_{x} n_{y} - n_{x} - n_{y})}{n_{x} n_{y} (n_{x} + n_{y})}] μ_{3}, \end{aligned}

which yields

(3)

M (μ_{3}) = \frac{n_{x} n_{y} (n_{x} + n_{y})}{n_{x}^{2} n_{y} + n_{x} n_{y}^{2} - 6 n_{x} n_{y} + 2 n_{x} + 2 n_{y}} m_{3} .

(3)

For this particular example, the result matches one-sample case if $n_{x} = n_{y}$ . That is not true in general, however. All the higher orders involve powers and products of lower moments that need to be expanded before taking expectations, affecting the systems of equations. For example,

E (m_{2}^{2}) = E [{(\frac{n_{x} m_{x 2} + n_{y} m_{y 2}}{n_{x} + n_{y}})}^{2}] = \frac{n_{x}^{2} E (m_{x 2}^{2}) + 2 n_{x} n_{y} E (m_{x 2}) E (m_{y 2}) + n_{y}^{2} E (m_{y 2}^{2})}{{(n_{x} + n_{y})}^{2}} .

3. Generating expressions for expectations

To derive general order expectations of naïve moment estimators and their powers and products, one needs to find expectations $E ({\overline{X^{}}}^{j_{1}} {\overline{X^{2}}}^{j_{2}} {\overline{X^{3}}}^{j_{3}} \dots)$ . To build up to this, we first describe the procedure for generating $E ({\overset{ˉ}{X}}^{k})$ , which easily extends to $E ({\overline{X}}^{k} {\overline{X^{2}}}^{l})$ and then to the general case that involves an arbitrarily long product of ${\overline{X^{i}}}^{j_{i}}$ .

3.1. Generate $E ({\overset{ˉ}{X}}^{k})$

(4)

E ({\overset{ˉ}{X}}^{k}) = \frac{1}{n^{k}} \sum_{i_{1} = 1}^{n} \sum_{i_{2} = 2}^{n} \dots \sum_{i_{k} = 1}^{n} E (X_{i_{1}} X_{i_{2}} \dots X_{i_{k}}) .

(4)

To find Equation (4), we need to consider all the different combinations of ordered indices $i_{1}, i_{2}, \dots, i_{k}$ ; $i_{j} = 1, \dots, n$ for each $j$ . There are $n^{k}$ such combinations but many combinations yield the same $E (X_{i_{1}} \dots X_{i_{k}})$ —for example, $E (X_{2} X_{2} X_{2} X_{2} X_{2} X_{5} X_{5} X_{1} X_{1}) = E (X_{4} X_{3} X_{4} X_{6} X_{6} X_{3} X_{6} X_{6} X_{6}) = μ_{2}^{2} μ_{5}$ . Combinations that produce the same expectation form a set that we will call a grouping (similar to “partitions” and “augmented symmetric functions” in some terminology), and the problem therefore reduces to considering all the groupings (each producing a distinct expectation) and calculating their coefficients, which are the number of combinations in each set. Each product $X_{i_{1}} \dots X_{i_{k}}$ can be broken into smaller products, or groups, of $X$ ’s with the same indices such as ${X_{i_{j}} : i_{j} = c}$ , $c = 1, \dots, n$ . The number of groups ranges between $1$ (when all the indices are the same: $i_{1} = i_{2} = \dots = i_{k}$ ) and $k$ (when all the indices are different: $i_{1} \neq i_{2} \neq \dots \neq i_{k}$ ); sizes of these groups determine $E (X_{i_{1}} \dots X_{i_{k}})$ . Thus, each grouping is fully characterized by the number of groups and the group sizes.

Let $d$ denote the number of groups in one grouping $G$ and $a_{1}, \dots, a_{d}$ —the numbers of $X$ ’s in each group, $\sum_{u = 1}^{d} a_{u} = k$ ; set of group sizes is unordered, so assigning indices to $a$ ’s is arbitrary (e.g. decreasing). In the example above: $k = 9$ , $d = 3$ , $a_{1} = 5$ , $a_{2} = 2$ , and $a_{3} = 2$ . If $\sum_{u = 1}^{d} I (a_{u} = 1) > 0$ (at least one group is of size $1$ ), $E (X_{i_{1}} \dots X_{i_{k}}) = 0$ since $E (X) = 0$ and there is no need to calculate a coefficient for this grouping, which is important in terms of computational efficiency; otherwise, $E (X_{i_{1}} \dots X_{i_{k}}) = \prod_{u = 1}^{d} μ_{a_{u}}$ . By adding a subscript $g$ to indicate a grouping $G$ , we get

(5)

E ({\overset{ˉ}{X}}^{k}) = \sum_{a l l g} C_{g} \prod_{u = 1}^{d_{g}} μ_{a_{g, u}},

(5)

where $C_{g}$ is the coefficient for $G$ , i.e. the number of combinations that yield ${a_{g, u}}$ .

(6)

C_{g} = (n)_{d} \frac{k!}{a_{g, 1}! a_{g, 2}! \dots a_{g, d}! s_{g, 1}! s_{g, 2}! \dots},

(6)

where $(n)_{d} = n (n - 1) \dots (n - d + 1)$ and $s_{g}$ ’s are the numbers of the same sized groups if there are any—e.g. for group sizes $a_{1} = a_{2} = 5$ , $a_{3} = a_{4} = 4$ , and $a_{5} = a_{6} = a_{7} = 2$ , we will get $s_{1} = 2$ , $s_{2} = 2$ , and $s_{3} = 3$ (from these we can gather that $k = 24$ , $d = 7$ , and $E (X_{1} \dots X_{k}) = μ_{5}^{2} μ_{4}^{2} μ_{2}^{3}$ ). In this particular setting, $C_{g} / (n)_{d}$ is analogous to the partition coefficient described in the literature (Dwyer, Citation1938; Fisher, Citation1930). Going back to our original example (group sizes ${5, 2, 2}$ ), there is only one $s_{g}$ : $s_{g, 1} = 2$ ; the coefficient for that example is $C_{g} = n (n - 1) (n - 2) \frac{9!}{5! 2! 2! 2!}$ .

One way of arriving at the expression for $C_{g}$ could be the following: there are $\frac{{(n)}_{d}}{s_{g, 1}! s_{g, 2}! \dots}$ ways to pick (unordered) indices that satisfy given group sizes (set ${a_{g, u}}$ ) and $\frac{k!}{a_{g, 1}! a_{g, 2}! \dots a_{g, d}!} = (\begin{matrix} k \\ a_{g, 1} \end{matrix}) (\begin{matrix} k - a_{g, 1} \\ a_{g, 2} \end{matrix}) \dots (\begin{matrix} a_{g, d - 1} + a_{g, d} \\ a_{g, d - 1} \end{matrix})$ ways to place these indices on $k$ positions (a multinomial coefficient).

Our software generates expressions for $E ({\overset{ˉ}{X}}^{k})$ for a given $k$ using the method described above. To find all the possible groupings, we impose an ordering on them and use it to generate each consecutive grouping when the previous one is given, thus moving through a complete set of groupings from ${a_{1} = k}$ to ${a_{1} = a_{2} = \dots = a_{k} = 1}$ . For example, in an agglomerative order, a grouping ${5, 2, 2}$ is preceded by ${5, 2, 1, 1}$ and followed by ${5, 3, 1}$ .

The smallest number of groups is $d = 1$ , which produces an order of $\frac{n}{n^{k}} = \frac{1}{n^{k - 1}}$ (the highest order in the range); the largest $d$ with a non-zero contribution to $E ({\overset{ˉ}{X}}^{k})$ is $⌊\frac{k}{2}⌋$ (when the indices of $X$ appear in pairs and there are no unpaired indices; when $k$ is odd, one of the groups is of size $3$ ), and the order it produces is $\frac{1}{n^{⌈\frac{k}{2}⌉}}$ .

3.2. Generate $E ({\overset{ˉ}{X}}^{k} {\overline{X^{2}}}^{l})$

(7)

E ({\overset{ˉ}{X}}^{k} {\overline{X^{2}}}^{l}) = \frac{1}{n^{k + l}} \sum_{i_{1} = 1}^{n} \dots \sum_{i_{k} = 1}^{n} \sum_{j_{1} = 1}^{n} \dots \sum_{j_{l} = 1}^{n} E (X_{i_{1}} \dots X_{i_{k}} X_{j_{1}}^{2} \dots X_{j_{l}}^{2}) .

(7)

To generate expressions for Equation (7), we extend the algorithm described in Section 3.1 for Equation (4). Now groups consist of $X$ ’s and $X^{2}$ ’s with the same indices: ${X_{i_{s}}, X_{j_{t}}^{2} : i_{s} = j_{t} = c}$ , $c = 1, \dots, n$ , and are thus described not by a single number (group size) but by a pair $(a, b)$ , where $a$ is the number of $i$ ’s and $b$ is the number of $j$ ’s in the group. Consequently, a grouping in this version is characterized by a set of pairs ${(a_{u}, b_{u})}$ , $u = 1, \dots, d$ ; $\sum_{u = 1}^{d} a_{u} = k$ , $\sum_{u = 1}^{d} b_{u} = l$ , and its definition is different from the one in Section 3.1 since for given $k$ and $l$ there can be different groupings that yield the same expectation, e.g. groupings ${(2, 3), (3, 0), (1, 1)}$ , ${(4, 2), (1, 1), (1, 1)}$ , and ${(0, 4), (3, 0), (3, 0)}$ will all produce $E (X_{i_{1}} \dots X_{i_{6}} X_{j_{1}}^{2} \dots X_{j_{4}}^{2}) = μ_{3}^{2} μ_{8}$ . Analogously to the original version, if $\sum_{u = 1}^{d} I (a = 1, b = 0) > 0$ (at least one pair in the grouping is $(1, 0)$ ), $E (X_{i_{1}} \dots X_{i_{k}} X_{j_{1}}^{2} \dots X_{j_{l}}^{2}) = 0$ ; otherwise, $E (X_{i_{1}} \dots X_{i_{k}} X_{j_{1}}^{2} \dots X_{j_{l}}^{2}) = \prod_{u = 1}^{d} μ_{a_{u} + 2 b_{u}}$ . Note that to account for all the possible groupings in this case, permutations need to be used, adding another layer to computational complexity.

Coefficient $C_{g}$ for a grouping $G$ is calculated in a similar way to Section 3.1 (Equation (6)) with a few adjustments:

(8)

C_{g} = (n)_{d} \frac{k! l!}{a_{g, 1}! a_{g, 2}! \dots a_{g, d}! b_{g, 1}! b_{g, 2}! \dots b_{g, d}! s_{g, 1}! s_{g, 2}! \dots},

(8)

where $s_{g, 1}, s_{g, 2}, \dots$ are the numbers of the groups with same values for $a$ and $b$ .

In this case, the order ranges from $\frac{1}{n^{k + l - 1}}$ , when $i_{1} = \dots = i_{k} = j_{1} = \dots = j_{k}$ ( $d = 1$ ), to $\frac{1}{n^{⌈\frac{k}{2}⌉}}$ , when all indices $i_{s}$ appear in pairs if $k$ is even (“extra” index joining one of the groups if $k$ is odd), and all the $j_{t}$ ’s are different from $i_{s}$ ’s and each other ( $d = ⌊ \frac{k}{2} ⌋ + 1$ ).

3.3. General case

The procedure in Section 3.2 easily generalizes to finding $E ({\overset{ˉ}{X}}^{j_{1}} {\overline{X^{2}}}^{j_{2}} {\overline{X^{3}}}^{j_{3}} \dots {\overline{X^{m}}}^{j_{m}})$ for an arbitrary $m$ , with groups described by a “tuple” of length $m$ and a grouping being a collection of such tuples. Coefficients $C_{g}$ for groupings $G$ are calculated similarly to Equation (8), accounting for all the elements in each tuple.

4. Results (up to sixth order)

Below are the results generated with our software (SymPy code that produces these results is in a https://github.com/innager/unbiasedMomentsJupyter notebook):

M (μ_{3}) = \frac{n^{2}}{(n - 1) (n - 2)} m_{3},

M (μ_{4}) = - \frac{3 n (2 n - 3)}{(n - 1) (n - 2) (n - 3)} m_{2}^{2} + \frac{n (n^{2} - 2 n + 3)}{(n - 1) (n - 2) (n - 3)} m_{4},

M (μ_{2}^{2}) = \frac{n (n^{2} - 3 n + 3)}{(n - 1) (n - 2) (n - 3)} m_{2}^{2} - \frac{n}{(n - 2) (n - 3)} m_{4},

M (μ_{5}) = - \frac{10 n^{2}}{(n - 1) (n - 3) (n - 4)} m_{2} m_{3} + \frac{n^{2} (n^{2} - 5 n + 10)}{(n - 1) (n - 2) (n - 3) (n - 4)} m_{5},

M (μ_{2} μ_{3}) = \frac{n^{2} (n^{2} - 2 n + 2)}{(n - 1) (n - 2) (n - 3) (n - 4)} m_{2} m_{3} - \frac{n^{2}}{(n - 2) (n - 3) (n - 4)} m_{5}

M (μ_{6}) = \frac{15 n^{2} (3 n - 10)}{(n - 1) (n - 2) (n - 3) (n - 4) (n - 5)} m_{2}^{3} - \frac{15 n (n^{3} - 8 n^{2} + 29 n - 40)}{(n - 1) (n - 2) (n - 3) (n - 4) (n - 5)} m_{2} m_{4}

- \frac{40 n (n^{2} - 6 n + 10)}{(n - 1) (n - 2) (n - 3) (n - 4) (n - 5)} m_{3}^{2} + \frac{n (n^{4} - 9 n^{3} + 31 n^{2} - 39 n + 40)}{(n - 1) (n - 2) (n - 3) (n - 4) (n - 5)} m_{6},

M (μ_{2} μ_{4}) = - \frac{3 n^{2} (2 n - 5)}{(n - 1) (n - 3) (n - 4) (n - 5)} m_{2}^{3} + \frac{n (n^{4} - 9 n^{3} + 53 n^{2} - 135 n + 120)}{(n - 1) (n - 2) (n - 3) (n - 4) (n - 5)} m_{2} m_{4}

+ \frac{4 n (n^{2} - 5 n + 10)}{(n - 1) (n - 3) (n - 4) (n - 5)} m_{3}^{2} - \frac{n (n^{2} - 3 n + 8)}{(n - 2) (n - 3) (n - 4) (n - 5)} m_{6},

M (μ_{3}^{2}) = - \frac{3 n^{2} (3 n^{2} - 15 n + 20)}{(n - 1) (n - 2) (n - 3) (n - 4) (n - 5)} m_{2}^{3} + \frac{3 n (2 n^{3} - 5 n^{2} - 5 n + 20)}{(n - 1) (n - 2) (n - 3) (n - 4) (n - 5)} m_{2} m_{4}

+ \frac{n (n^{4} - 8 n^{3} + 25 n^{2} - 10 n - 40)}{(n - 1) (n - 2) (n - 3) (n - 4) (n - 5)} m_{3}^{2} - \frac{n (n^{2} - n + 4)}{(n - 2) (n - 3) (n - 4) (n - 5)} m_{6},

M (μ_{2}^{3}) = \frac{n^{2} (n^{2} - 7 n + 15)}{(n - 1) (n - 3) (n - 4) (n - 5)} m_{2}^{3} - \frac{3 n (n^{2} - 5 n + 10)}{(n - 1) (n - 3) (n - 4) (n - 5)} m_{2} m_{4}

- \frac{2 n (3 n^{2} - 15 n + 20)}{(n - 1) (n - 2) (n - 3) (n - 4) (n - 5)} m_{3}^{2} + \frac{2 n}{(n - 3) (n - 4) (n - 5)} m_{6} .

For two-sample pooled estimators up to sixth order, refer to the Umoments package (Gerlovina & Hubbard, Citation2019) and https://github.com/innager/unbiasedMoments Sage worksheet.

5. Umoments R package

Umoments contains a set of pre-programmed functions that calculate one-sample and pooled two-sample unbiased moment estimates, both up to sixth order. This functionality is primarily useful for data analysis. The estimates can be calculated either directly from the sample or from naïve biased estimates, in which case sample size $n$ needs to be provided. For two-sample estimation, input should also include labels indicating which observation belongs to which sample/category, or both $n_{x}$ and $n_{y}$ for sample sizes. Below are some examples.

Two-sample pooled estimates from the data up to sixth order (note that smp is a data vector, and treatment is a vector of labels that separates it into two categories):

> uMpool(smp, treatment, 6)

M2 M3 M2pow2 M4 M2M3 M5 M2pow3

1.6443027 1.5188515 2.4878505 6.9794503 2.0615514 17.0989234 3.5236856

M3pow2 M2M4 M6

0.6674177 9.4046220 56.6016025

Unbiased estimate of $μ_{3}^{2}$ from naïve biased second, third, fourth, and sixth moment estimates:

> uM3pow2(m[2], m[3], m[4], m[6], n)

[1] −10.00696

Other functions in the package can be used to obtain higher-order estimators, pooled estimators across multiple (three or more) samples, and other statistical results.

Generate $E ({\overline{X_{}}}^{3} {\overline{X^{3}}}^{2} \overline{X^{4}})$ for a sample $X_{1}, \dots, X_{n_{x}}$ (the output is a string that could be used as a code chunk, fed into a computer algebra system, or converted into latex):

> one_combination(c(3, 0, 2, 1), “n_x”)

[1] “(1*n_x*mu13^1 + 3*n_x*(n_x-1)*mu2^1*mu11^1 + 3*n_x*(n_x-1)*(n_x-2)*(n_x-3)*mu5^1*mu3^2*mu2^1 + 6*n_x*(n_x-1)*(n_x-2)*(n_x-3)*mu4^2*mu3^1*mu2^1 + 6*n_x*(n_x-1)*(n_x-2)*mu8^1*mu3^1*mu2^1 + 9*n_x*(n_x-1)*(n_x-2)*mu7^1*mu4^1*mu2^1 + 3*n_x*(n_x-1)*(n_x-2)*mu6^1*mu5^1*mu2^1 + 3*n_x*(n_x-1)*mu3^1*mu10^1 + 1*n_x*(n_x-1)*(n_x-2)*(n_x-3)*mu4^1*mu3^3 + 3*n_x*(n_x-1)*(n_x-2)*mu7^1*mu3^2 + 9*n_x*(n_x-1)*(n_x-2)*mu6^1*mu4^1*mu3^1 + 6*n_x*(n_x-1)*(n_x-2)*mu5^2*mu3^1 + 12*n_x*(n_x-1)*(n_x-2)*mu5^1*mu4^2 + 7*n_x*(n_x-1)*mu9^1*mu4^1 + 9*n_x*(n_x-1)* mu8^1*mu5^1 + 6*n_x*(n_x-1)*mu7^1*mu6^1)/n_x^6”

Generate groupings for $k = 5$ (see Section 3.1):

> Umoments:::groups(5)

[[1]]

[1] 1 1 1 1 1

[[2]]

[1] 2 1 1 1

[[3]]

[1] 2 2 1

[[4]]

[1] 3 1 1

[[5]]

[1] 3 2

[[6]]

[1] 4 1

[[7]]

[1] 5

For further details and examples, refer to package vignette and documentation (Gerlovina & Hubbard, Citation2019).

6. Discussion

The difference between unbiased and biased estimators depends on the sample size and might be considerable for small samples; also, for fixed sample size, it is relatively greater for higher orders. At the same time, variability of the estimators is an important factor to be considered in this bias-variance trade-off, especially in connection with sample size $n$ and the order of the estimators as variability increases with higher orders (which might be offset by the lower contribution/weight of these orders in certain methods) and smaller samples. Another question is the relationship between $n$ and the maximal order that could reasonably be used in a method; besides purely algebraic restrictions on a sample size for a given order, apparent from the expressions for unbiased estimators ( $n \geq k$ for $k$ ’th order estimators), there might be another underlying stricter relationship that needs to be explored, either theoretically or numerically.

While unbiased estimators of products and integer powers of moments are possible to obtain, that is not the case with ratios and roots. Of course, such biased estimators, like a square root $s$ of sample variance $s^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} (X_{i} - \overset{ˉ}{X})^{2}$ or skewness estimator, are widely used in practice. Adding to the complexity is the fact that since unbiased estimator of the ratio cannot be obtained, simplifying expressions should also be questioned—consider, for example, scaled sixth cumulant:

λ_{6} = \frac{κ_{6}}{μ_{2}^{3}} = \frac{μ_{6} - 15 μ_{2} μ_{4} - 10 μ_{3}^{2} + 30 μ_{2}^{3}}{μ_{2}^{3}} = \frac{μ_{6}}{μ_{2}^{3}} - 15 \frac{μ_{4}}{μ_{2}^{2}} - 10 \frac{μ_{3}^{2}}{μ_{2}^{3}} + 30.

For a closest estimate, it is natural to consider the ratio of an unbiased cumulant estimator $M (κ_{6})$ and an unbiased scaling factor $M (μ_{2}^{3})$ . Then, is $\frac{M (μ_{2} μ_{4})}{M (μ_{2}^{3})}$ preferable to $\frac{M (μ_{4})}{M (μ_{2}^{2})}$ for the second term?

This example also provides an illustration for another important consideration that should factor into a decision of which estimators to use—variability of the denominator in studentized statistics. In $λ_{6}$ , sixth cumulant $κ_{6}$ is scaled by $μ_{2}^{3}$ ; to substitute for this unknown quantity, a variety of estimators can be used: $m_{2}^{3}$ , ${[M (μ_{2})]}^{3}$ , or $M (μ_{2}^{3})$ , to name a few. While expression for $M (μ_{2})$ (and thus its cube) contains $m_{2}$ only, the expression for $M (μ_{2}^{3})$ includes $m_{4}$ and $m_{6}$ as well. These higher-order quantities may be highly variable, especially in the small sample, and therefore the whole ratio becomes highly sensitive to the small values of estimates in the denominator that can inflate $λ_{6}$ dramatically, increasing variability of the ratio to the point of unusability. Therefore, it might be advisable in certain cases, e.g. with considerably skewed distributions, to perform some numeric exploration to determine if it might be indeed preferable to use lower-order estimators, naïve biased or unbiased, in place of parameters in denominators because of their relative stability (“power of mean” instead of “mean of power”).

Correction

This article has been republished with minor changes. These changes do not impact the academic content of the article

Additional information

Funding

This work was supported by NIH under Grant [P42ES004705].

Notes on contributors

Inna Gerlovina

Inna Gerlovina is a postdoctoral scholar at the University of California, San Francisco. She has worked on small sample inference and error rate control as well as higher-order inferential approaches, developing software packages for high-dimensional data analysis. Her current interests include development, implementation, and application of statistical methods that contribute to the understanding of malaria epidemiology and transmission. Inna completed her MA and PhD in Biostatistics at the University of California, Berkeley.

Alan E. Hubbard

Alan Hubbard, Professor of Biostatistics, University of California, Berkeley, is a director of the Biomedical Big Data pre-doctoral training program, and co-director of the Center of Targeted Learning, is head of the computational biology Core E of the SuperFund Center at UC Berkeley (NIH/EPA), as well a consulting statistician on several federally funded and foundation projects. He has worked as well on projects ranging from molecular biology of aging, epidemiology, and infectious disease modeling, but most all of his work has focused on semi-parametric estimation in high-dimensional data. His current methods-research focuses on precision medicine, variable importance, statistical inference for data-adaptive parameters, and statistical software implementing targeted learning methods. He is currently working in several areas of applied research, including early childhood development in developing countries, patient outcomes from acute trauma, environmental genomics and comparative effectiveness research in diabetes care.

References

Bickel, P. (1974). Edgeworth expansions in nonparametric statistics. The Annals of Statistics, 2, 1–11. doi:10.1214/aos/1176342609
Web of Science ®Google Scholar
Cramér, H. (2016). Mathematical methods of statistics (pms-9) (Vol. 9). Princeton, NJ: Princeton university press.
Google Scholar
Dwyer, P. S. (1938). Combined expansions of products of symmetric power sums and of sums of symmetric power products with application to sampling. The Annals of Mathematical Statistics, 9(1), 1–47. doi:10.1214/aoms/1177732357
Google Scholar
Fisher, R. A. (1930). Moments and product moments of sampling distributions. Proceedings of the London Mathematical Society, 2(1), 199–238. doi:10.1112/plms/s2-30.1.199
Google Scholar
Gerlovina, I., & Hubbard, A. E. (2019). Umoments: Unbiased central moment estimates. R package version 0.1.0. Retrieved from https://CRAN.R-project.org/package=Umoments
Google Scholar
Hall, P. (1987). Edgeworth expansion for student’s t statistic under minimal moment conditions. The Annals of Probability, 15(3), 920–931. doi:10.1214/aop/1176992073
Web of Science ®Google Scholar
Pebay, P. P., Terriberry, T., Kolla, H., Bennett, J. C. (2014). Formulas for the computation of higher-order central moments. Livermore, CA, USA: Sandia National Lab, (SNL-CA).
Google Scholar
Putter, H., van Zwet, W. R. (1998). Empirical edgeworth expansions for symmetric statistics. The Annals of Statistics, 26(4), 1540–1569. doi:10.1214/aos/1024691253
Web of Science ®Google Scholar
Rimoldini, L. (2014). Weighted skewness and kurtosis unbiased by sample size and Gaussian uncertainties. Astronomy and Computing, 5, 1–8. doi:10.1016/j.ascom.2014.02.001
Web of Science ®Google Scholar
Rose, C., & Smith, M. (2002). Mathematical statistics with mathematica, chapter 7: Moments of sampling distributions. New York: Springer-Verlag.
Google Scholar
Tracy, D., Gupta, B. (1974). Generalized h-statistics and other symmetric functions. The Annals of Statistics, 2(4), 837–844. doi:10.1214/aos/1176342774
Web of Science ®Google Scholar

Computer algebra and algorithms for unbiased moment estimation of arbitrary order

Abstract

PUBLIC INTEREST STATEMENT

1. Introduction

2. Procedure in general

2.1. One-sample estimates

2.2. Pooled estimates

3. Generating expressions for expectations

3.1. Generate $E ({\overset{ˉ}{X}}^{k})$

3.2. Generate $E ({\overset{ˉ}{X}}^{k} {\overline{X^{2}}}^{l})$

3.3. General case

4. Results (up to sixth order)

5. Umoments R package

6. Discussion

Correction

Notes on contributors

Inna Gerlovina

Alan E. Hubbard

References

Information for

Open access

Opportunities

Help and information

Computer algebra and algorithms for unbiased moment estimation of arbitrary order

Abstract

PUBLIC INTEREST STATEMENT

1. Introduction

2. Procedure in general

2.1. One-sample estimates

2.2. Pooled estimates

3. Generating expressions for expectations

3.1. Generate EXˉk

3.2. Generate EXˉkX2‾l

3.3. General case

4. Results (up to sixth order)

5. Umoments R package

6. Discussion

Correction

Additional information

Funding

Notes on contributors

Inna Gerlovina

Alan E. Hubbard

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

3.1. Generate $E ({\overset{ˉ}{X}}^{k})$

3.2. Generate $E ({\overset{ˉ}{X}}^{k} {\overline{X^{2}}}^{l})$