![MathJax Logo](/templates/jsp/_style2/_tandf/pb2/images/math-jax.gif)
Abstract
While unbiased central moment estimators of lower orders (such as a sample variance) are easily obtainable and often used in practice, derivation of unbiased estimators of higher orders might be more challenging due to long math and tricky combinatorics. Moreover, higher orders necessitate calculation of estimators of powers and products that also amount to these orders. We develop a software algorithm that allows the user to obtain unbiased estimators of an arbitrary order and provide results up to the sixth order, including powers and products of lower orders. The method also extends to finding pooled estimates of higher central moments of several different populations (e.g. for two-sample tests). We introduce an R package Umoments that calculates one- and two-sample estimates and generates intermediate results used to obtain these estimators.
PUBLIC INTEREST STATEMENT
Higher-order statistics are increasingly used in various research fields and data analysis, and central moment estimates are useful for many approaches. Derivation of higher-order unbiased central moment estimators has long been a challenging task; software made the general order solution possible. This paper describes a direct approach to obtaining estimators of any order, including multi-sample pooled estimators. It also introduces an open source R package Umoments, which calculates one- and two-sample estimates up to sixth order and contains machinery to obtain even higher-order estimates, including a combinatorial algorithm that can be used for solving other problems and assist in long derivations (e.g. Edgeworth expansions).
1. Introduction
Most data analysis methods rely on estimating unknown quantities such as characteristics of an underlying distribution or an effect of a treatment. From a variety of possible estimators of an unknown true parameter, the ones that are typically chosen have certain desirable properties—e.g. consistency, efficiency, or unbiasedness. When the sample size is moderate or small, finite sample behavior of an estimator—such as bias, variability, and mean squared error—is particularly relevant and is therefore often given special consideration. In addition, when estimation is conducted across multiple samples or studies (pooled estimators), bias may become an important issue.
Moments of a distribution are the most basic building blocks of statistical analysis and their estimates are present in some form in virtually any practical application. A sample average is an estimate of the mean (first moment). Estimates of the variance (second central moment) are routinely used in statistical inference; they are present in all studentized statistics, of which the most common example is an ordinary -statistic. Higher moments and their estimates, while not as widely used, can be important in various statistical applications and inferential procedures; they also comprise cumulants and their scaled versions (skewness, kurtosis). They are often used in signal processing, financial modeling, and many other areas (for a list of applications, see Pebay, Terriberry, & Kolla et al., Citation2014). Methods that employ higher-order statistics might utilize data in a more efficient way and offer greater insight into the distribution of interest, thus providing additional refinement in inference, for example through the use of higher-order approximations to the distribution of a test statistic—such as empirical Edgeworth expansions (Bickel, Citation1974; Hall, Citation1987; Putter & van Zwet, Citation1998). These methods would require higher-order moment estimation and warrant considerations about estimators’ finite sample properties; since moderate or small sample analysis would benefit from such higher-order approaches, unbiased estimates could prove particularly useful.
Naïve estimators ,
of central moments
are biased—that is,
. The first unbiased estimator was introduced for variance by Friedrich Bessel; it is obtained by multiplying
by a factor
, thus often called Bessel’s correction. That estimator is a part of an ordinary
-statistic and therefore plays a role in Student’s
-distribution; it corresponds to the degrees of freedom in chi-squared distribution that arises from
when
is a standard normal random variable. The corresponding standard deviation estimator, however, is still biased (though the bias is reduced) and underestimates the true parameter. Interest to unbiased moment and cumulant estimation has a long history, which led to theoretical advances and various strategies to be able to obtain higher-order estimators. The work of R.A. Fisher (1928) (Fisher, Citation1930) provided basis for much of this research, particularly on cumulants; for central moments, unbiased estimators up to fourth order (or “weight” in some literature) have been published by Harald Cramér in 1946 (Cramér, Citation2016); the results were later expanded for more complex settings (e.g. including weights (Rimoldini, Citation2014)).
Whereas derivation of unbiased moment estimators in general is straightforward, higher-order calculations involve long algebra and require obtaining nontrivial coefficients, and brute-force calculations of these coefficients become unfeasible fairly quickly. Having observations from different populations or categories, requiring pooled estimators, compounds the problem. Unlike second and third central moments, where naïve biased moment estimators differ from unbiased ones by a constant factor that does not depend on data, subsequent orders require calculation of combinations (integer powers and products) of lower moments that amount to the same order, which in turn creates systems of equations to be solved. With computer algebra, manipulating long algebraic expressions and solving reasonably large systems of linear equations is no longer an issue; the challenge can then be condensed to finding an expectation of the form , where
, of an arbitrary order and length, written in terms of sample size and true central moments of the distribution of
. Thus, a software algorithm that solves this problem and computer algebra can provide the machinery needed to obtain one-sample and multi-sample pooled estimates of any order, limited only by available processing power.
General order solutions for many problems formulated in the course of unbiased estimation history, including cumulant and moment estimation, are provided in mathStatica (Rose & Smith, Citation2002), an add-on package for the proprietory computer algebra system Mathematica. Still, given many potential uses for such estimates, there is a need for open-source software and easy-to-use tools, accessible to a wide range of researchers, that could be seamlessly incorporated into data analysis. Multi-sample pooled estimation, which has not received much attention in higher-order statistics pursuit (and is not included in mathStatica), can have many practical applications, especially in two-sample settings (e.g. comparing treatment and control groups). In addition, open access to the code and algorithms that are used in generating arbitrary order estimates can be used for obtaining other statistical results, e.g. Edgeworth expansions. We introduce an R package Umoments (Gerlovina & Hubbard, Citation2019), which provides pre-programmed functions that calculate one- and two-sample estimates up to sixth order, either from data or from naïve biased estimates, as well as algorithms and tools for generating general order estimators.
In this paper, we break down the procedure of obtaining unbiased moment estimators of an arbitrary order, as well as estimators of products and powers of moments (also referred to as generalized -statistics (Tracy & Gupta, Citation1974)) such as
; an analogous procedure is provided for multi-sample pooled estimators. Additionally, this direct approach is illustrated in the Sage and Jupyter https://github.com/innager/unbiasedMoments templates. Next, we describe the algorithm that generates an expression for expectation of raw (non-central) sample moments and their powers and products, thus automating the challenging part of the derivation. Results section provides a full set of one-sample unbiased estimators up to sixth order (or “weight” in some literature); two-sample pooled estimators can be found in Umoments package but orders four and higher are too long to include in the paper. Results are followed by a quick overview of Umoments package functions; we conclude with a discussion about practical applications of these estimators.
2. Procedure in general
2.1. One-sample estimates
For simplicity, we can consider a mean-zero random variable without any loss of generality. Let be a sample of independent identically distributed random variables with
and central moments
(mean
, variance
), in this case equal to raw moments;
. We also adopt the following useful notation:
—naïve biased central moment estimators
—unbiased estimator of an expression inside the parentheses (for quantities such as central moments and their powers and products), e.g.
.
The steps to obtain unbiased estimators of a general order are straightforward:
for a desired order, list all the moment combinations for that order (example provided below);
expand their naïve biased estimators (remove the brackets);
take expectations and represent the results in terms of moments
and sample size
; this will produce an equation or a system of equations;
solve this equation or a system of equations for true moments.
As an illustration, we go through these steps for , an unbiased estimator of a third central moment:
Steps and
are trivial and are performed using computer algebra. Calculation of any unbiased moment estimator of a given order involves all the combinations (powers and products of moments) of that order, which means that for fourth and higher orders there will be a system of equations rather than a single equation (recall that
). For example, estimators of seventh order will include
,
,
and
(Step
); Step
will correspondingly expand
,
,
and
producing four equations. Since the equations need to be solved for a given order’s combinations of true moments, not individual moments, and all the equations in the system are linear in that order, it makes sense to treat these combinations as single variables, thus solving a system of linear equations.
Step is more challenging but the problem can be reduced to finding an expression for
since any term from the right-hand side of Step
equations can be written in that form—e.g.
. A general solution to this problem is provided in Umoments package (Gerlovina & Hubbard, Citation2019) that generates expressions for these expectations using combinatorics. This algorithm is explained in detail in Section 3.
2.2. Pooled estimates
A simple extension of the method can be used to obtain unbiased estimators of central moments for samples that contain observations from several populations or categories. We demonstrate the procedure on a two-category estimation, which extends trivially to any number of categories.
For a sample ,
, let
where is a two-sample analog of the naïve biased estimator described previously. Note that pooled estimation implies an assumption of equality of estimated central moments between distributions of
and
:
. Using this assumption, independence of
and
, and one-sample results from Step
in Section 2.1, we extend Step
of the roadmap to incorporate two variables and obtain expectations.
Example: obtain two-sample pooled estimate of the third central moment. Using one-sample result (1), we get
which yields
For this particular example, the result matches one-sample case if . That is not true in general, however. All the higher orders involve powers and products of lower moments that need to be expanded before taking expectations, affecting the systems of equations. For example,
3. Generating expressions for expectations
To derive general order expectations of naïve moment estimators and their powers and products, one needs to find expectations . To build up to this, we first describe the procedure for generating
, which easily extends to
and then to the general case that involves an arbitrarily long product of
.
3.1. Generate ![](//:0)
![](//:0)
To find Equation (4), we need to consider all the different combinations of ordered indices ;
for each
. There are
such combinations but many combinations yield the same
—for example,
. Combinations that produce the same expectation form a set that we will call a grouping (similar to “partitions” and “augmented symmetric functions” in some terminology), and the problem therefore reduces to considering all the groupings (each producing a distinct expectation) and calculating their coefficients, which are the number of combinations in each set. Each product
can be broken into smaller products, or groups, of
’s with the same indices such as
,
. The number of groups ranges between
(when all the indices are the same:
) and
(when all the indices are different:
); sizes of these groups determine
. Thus, each grouping is fully characterized by the number of groups and the group sizes.
Let denote the number of groups in one grouping
and
—the numbers of
’s in each group,
; set of group sizes is unordered, so assigning indices to
’s is arbitrary (e.g. decreasing). In the example above:
,
,
,
, and
. If
(at least one group is of size
),
since
and there is no need to calculate a coefficient for this grouping, which is important in terms of computational efficiency; otherwise,
. By adding a subscript
to indicate a grouping
, we get
where is the coefficient for
, i.e. the number of combinations that yield
.
where and
’s are the numbers of the same sized groups if there are any—e.g. for group sizes
,
, and
, we will get
,
, and
(from these we can gather that
,
, and
). In this particular setting,
is analogous to the partition coefficient described in the literature (Dwyer, Citation1938; Fisher, Citation1930). Going back to our original example (group sizes
), there is only one
:
; the coefficient for that example is
.
One way of arriving at the expression for could be the following: there are
ways to pick (unordered) indices that satisfy given group sizes (set
) and
ways to place these indices on
positions (a multinomial coefficient).
Our software generates expressions for for a given
using the method described above. To find all the possible groupings, we impose an ordering on them and use it to generate each consecutive grouping when the previous one is given, thus moving through a complete set of groupings from
to
. For example, in an agglomerative order, a grouping
is preceded by
and followed by
.
The smallest number of groups is , which produces an order of
(the highest order in the range); the largest
with a non-zero contribution to
is
(when the indices of
appear in pairs and there are no unpaired indices; when
is odd, one of the groups is of size
), and the order it produces is
.
3.2. Generate ![](//:0)
![](//:0)
To generate expressions for Equation (7), we extend the algorithm described in Section 3.1 for Equation (4). Now groups consist of ’s and
’s with the same indices:
,
, and are thus described not by a single number (group size) but by a pair
, where
is the number of
’s and
is the number of
’s in the group. Consequently, a grouping in this version is characterized by a set of pairs
,
;
,
, and its definition is different from the one in Section 3.1 since for given
and
there can be different groupings that yield the same expectation, e.g. groupings
,
, and
will all produce
. Analogously to the original version, if
(at least one pair in the grouping is
),
; otherwise,
. Note that to account for all the possible groupings in this case, permutations need to be used, adding another layer to computational complexity.
Coefficient for a grouping
is calculated in a similar way to Section 3.1 (Equation (6)) with a few adjustments:
where are the numbers of the groups with same values for
and
.
In this case, the order ranges from , when
(
), to
, when all indices
appear in pairs if
is even (“extra” index joining one of the groups if
is odd), and all the
’s are different from
’s and each other (
).
3.3. General case
The procedure in Section 3.2 easily generalizes to finding for an arbitrary
, with groups described by a “tuple” of length
and a grouping being a collection of such tuples. Coefficients
for groupings
are calculated similarly to Equation (8), accounting for all the elements in each tuple.
4. Results (up to sixth order)
Below are the results generated with our software (SymPy code that produces these results is in a https://github.com/innager/unbiasedMomentsJupyter notebook):
For two-sample pooled estimators up to sixth order, refer to the Umoments package (Gerlovina & Hubbard, Citation2019) and https://github.com/innager/unbiasedMoments Sage worksheet.
5. Umoments R package
Umoments contains a set of pre-programmed functions that calculate one-sample and pooled two-sample unbiased moment estimates, both up to sixth order. This functionality is primarily useful for data analysis. The estimates can be calculated either directly from the sample or from naïve biased estimates, in which case sample size needs to be provided. For two-sample estimation, input should also include labels indicating which observation belongs to which sample/category, or both
and
for sample sizes. Below are some examples.
Two-sample pooled estimates from the data up to sixth order (note that smp is a data vector, and treatment is a vector of labels that separates it into two categories):
> uMpool(smp, treatment, 6)
M2 M3 M2pow2 M4 M2M3 M5 M2pow3
1.6443027 1.5188515 2.4878505 6.9794503 2.0615514 17.0989234 3.5236856
M3pow2 M2M4 M6
0.6674177 9.4046220 56.6016025
Unbiased estimate of from naïve biased second, third, fourth, and sixth moment estimates:
> uM3pow2(m[2], m[3], m[4], m[6], n)
[1] −10.00696
Other functions in the package can be used to obtain higher-order estimators, pooled estimators across multiple (three or more) samples, and other statistical results.
Generate for a sample
(the output is a string that could be used as a code chunk, fed into a computer algebra system, or converted into latex):
> one_combination(c(3, 0, 2, 1), “n_x”)
[1] “(1*n_x*mu13^1 + 3*n_x*(n_x-1)*mu2^1*mu11^1 + 3*n_x*(n_x-1)*(n_x-2)*(n_x-3)*mu5^1*mu3^2*mu2^1 + 6*n_x*(n_x-1)*(n_x-2)*(n_x-3)*mu4^2*mu3^1*mu2^1 + 6*n_x*(n_x-1)*(n_x-2)*mu8^1*mu3^1*mu2^1 + 9*n_x*(n_x-1)*(n_x-2)*mu7^1*mu4^1*mu2^1 + 3*n_x*(n_x-1)*(n_x-2)*mu6^1*mu5^1*mu2^1 + 3*n_x*(n_x-1)*mu3^1*mu10^1 + 1*n_x*(n_x-1)*(n_x-2)*(n_x-3)*mu4^1*mu3^3 + 3*n_x*(n_x-1)*(n_x-2)*mu7^1*mu3^2 + 9*n_x*(n_x-1)*(n_x-2)*mu6^1*mu4^1*mu3^1 + 6*n_x*(n_x-1)*(n_x-2)*mu5^2*mu3^1 + 12*n_x*(n_x-1)*(n_x-2)*mu5^1*mu4^2 + 7*n_x*(n_x-1)*mu9^1*mu4^1 + 9*n_x*(n_x-1)* mu8^1*mu5^1 + 6*n_x*(n_x-1)*mu7^1*mu6^1)/n_x^6”
Generate groupings for (see Section 3.1):
> Umoments:::groups(5)
[[1]]
[1] 1 1 1 1 1
[[2]]
[1] 2 1 1 1
[[3]]
[1] 2 2 1
[[4]]
[1] 3 1 1
[[5]]
[1] 3 2
[[6]]
[1] 4 1
[[7]]
[1] 5
For further details and examples, refer to package vignette and documentation (Gerlovina & Hubbard, Citation2019).
6. Discussion
The difference between unbiased and biased estimators depends on the sample size and might be considerable for small samples; also, for fixed sample size, it is relatively greater for higher orders. At the same time, variability of the estimators is an important factor to be considered in this bias-variance trade-off, especially in connection with sample size and the order of the estimators as variability increases with higher orders (which might be offset by the lower contribution/weight of these orders in certain methods) and smaller samples. Another question is the relationship between
and the maximal order that could reasonably be used in a method; besides purely algebraic restrictions on a sample size for a given order, apparent from the expressions for unbiased estimators (
for
’th order estimators), there might be another underlying stricter relationship that needs to be explored, either theoretically or numerically.
While unbiased estimators of products and integer powers of moments are possible to obtain, that is not the case with ratios and roots. Of course, such biased estimators, like a square root of sample variance
or skewness estimator, are widely used in practice. Adding to the complexity is the fact that since unbiased estimator of the ratio cannot be obtained, simplifying expressions should also be questioned—consider, for example, scaled sixth cumulant:
For a closest estimate, it is natural to consider the ratio of an unbiased cumulant estimator and an unbiased scaling factor
. Then, is
preferable to
for the second term?
This example also provides an illustration for another important consideration that should factor into a decision of which estimators to use—variability of the denominator in studentized statistics. In , sixth cumulant
is scaled by
; to substitute for this unknown quantity, a variety of estimators can be used:
,
, or
, to name a few. While expression for
(and thus its cube) contains
only, the expression for
includes
and
as well. These higher-order quantities may be highly variable, especially in the small sample, and therefore the whole ratio becomes highly sensitive to the small values of estimates in the denominator that can inflate
dramatically, increasing variability of the ratio to the point of unusability. Therefore, it might be advisable in certain cases, e.g. with considerably skewed distributions, to perform some numeric exploration to determine if it might be indeed preferable to use lower-order estimators, naïve biased or unbiased, in place of parameters in denominators because of their relative stability (“power of mean” instead of “mean of power”).
Correction
This article has been republished with minor changes. These changes do not impact the academic content of the article
Additional information
Funding
Notes on contributors
Inna Gerlovina
Inna Gerlovina is a postdoctoral scholar at the University of California, San Francisco. She has worked on small sample inference and error rate control as well as higher-order inferential approaches, developing software packages for high-dimensional data analysis. Her current interests include development, implementation, and application of statistical methods that contribute to the understanding of malaria epidemiology and transmission. Inna completed her MA and PhD in Biostatistics at the University of California, Berkeley.
Alan E. Hubbard
Alan Hubbard, Professor of Biostatistics, University of California, Berkeley, is a director of the Biomedical Big Data pre-doctoral training program, and co-director of the Center of Targeted Learning, is head of the computational biology Core E of the SuperFund Center at UC Berkeley (NIH/EPA), as well a consulting statistician on several federally funded and foundation projects. He has worked as well on projects ranging from molecular biology of aging, epidemiology, and infectious disease modeling, but most all of his work has focused on semi-parametric estimation in high-dimensional data. His current methods-research focuses on precision medicine, variable importance, statistical inference for data-adaptive parameters, and statistical software implementing targeted learning methods. He is currently working in several areas of applied research, including early childhood development in developing countries, patient outcomes from acute trauma, environmental genomics and comparative effectiveness research in diabetes care.
References
- Bickel, P. (1974). Edgeworth expansions in nonparametric statistics. The Annals of Statistics, 2, 1–11. doi:10.1214/aos/1176342609
- Cramér, H. (2016). Mathematical methods of statistics (pms-9) (Vol. 9). Princeton, NJ: Princeton university press.
- Dwyer, P. S. (1938). Combined expansions of products of symmetric power sums and of sums of symmetric power products with application to sampling. The Annals of Mathematical Statistics, 9(1), 1–47. doi:10.1214/aoms/1177732357
- Fisher, R. A. (1930). Moments and product moments of sampling distributions. Proceedings of the London Mathematical Society, 2(1), 199–238. doi:10.1112/plms/s2-30.1.199
- Gerlovina, I., & Hubbard, A. E. (2019). Umoments: Unbiased central moment estimates. R package version 0.1.0. Retrieved from https://CRAN.R-project.org/package=Umoments
- Hall, P. (1987). Edgeworth expansion for student’s t statistic under minimal moment conditions. The Annals of Probability, 15(3), 920–931. doi:10.1214/aop/1176992073
- Pebay, P. P., Terriberry, T., Kolla, H., Bennett, J. C. (2014). Formulas for the computation of higher-order central moments. Livermore, CA, USA: Sandia National Lab, (SNL-CA).
- Putter, H., van Zwet, W. R. (1998). Empirical edgeworth expansions for symmetric statistics. The Annals of Statistics, 26(4), 1540–1569. doi:10.1214/aos/1024691253
- Rimoldini, L. (2014). Weighted skewness and kurtosis unbiased by sample size and Gaussian uncertainties. Astronomy and Computing, 5, 1–8. doi:10.1016/j.ascom.2014.02.001
- Rose, C., & Smith, M. (2002). Mathematical statistics with mathematica, chapter 7: Moments of sampling distributions. New York: Springer-Verlag.
- Tracy, D., Gupta, B. (1974). Generalized h-statistics and other symmetric functions. The Annals of Statistics, 2(4), 837–844. doi:10.1214/aos/1176342774