897
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Computer algebra and algorithms for unbiased moment estimation of arbitrary order

& | (Reviewing editor)
Article: 1701917 | Received 07 Aug 2019, Accepted 03 Dec 2019, Published online: 21 Dec 2019

Abstract

While unbiased central moment estimators of lower orders (such as a sample variance) are easily obtainable and often used in practice, derivation of unbiased estimators of higher orders might be more challenging due to long math and tricky combinatorics. Moreover, higher orders necessitate calculation of estimators of powers and products that also amount to these orders. We develop a software algorithm that allows the user to obtain unbiased estimators of an arbitrary order and provide results up to the sixth order, including powers and products of lower orders. The method also extends to finding pooled estimates of higher central moments of several different populations (e.g. for two-sample tests). We introduce an R package Umoments that calculates one- and two-sample estimates and generates intermediate results used to obtain these estimators.

PUBLIC INTEREST STATEMENT

Higher-order statistics are increasingly used in various research fields and data analysis, and central moment estimates are useful for many approaches. Derivation of higher-order unbiased central moment estimators has long been a challenging task; software made the general order solution possible. This paper describes a direct approach to obtaining estimators of any order, including multi-sample pooled estimators. It also introduces an open source R package Umoments, which calculates one- and two-sample estimates up to sixth order and contains machinery to obtain even higher-order estimates, including a combinatorial algorithm that can be used for solving other problems and assist in long derivations (e.g. Edgeworth expansions).

1. Introduction

Most data analysis methods rely on estimating unknown quantities such as characteristics of an underlying distribution or an effect of a treatment. From a variety of possible estimators of an unknown true parameter, the ones that are typically chosen have certain desirable properties—e.g. consistency, efficiency, or unbiasedness. When the sample size is moderate or small, finite sample behavior of an estimator—such as bias, variability, and mean squared error—is particularly relevant and is therefore often given special consideration. In addition, when estimation is conducted across multiple samples or studies (pooled estimators), bias may become an important issue.

Moments of a distribution are the most basic building blocks of statistical analysis and their estimates are present in some form in virtually any practical application. A sample average is an estimate of the mean (first moment). Estimates of the variance (second central moment) are routinely used in statistical inference; they are present in all studentized statistics, of which the most common example is an ordinary t-statistic. Higher moments and their estimates, while not as widely used, can be important in various statistical applications and inferential procedures; they also comprise cumulants and their scaled versions (skewness, kurtosis). They are often used in signal processing, financial modeling, and many other areas (for a list of applications, see Pebay, Terriberry, & Kolla et al., Citation2014). Methods that employ higher-order statistics might utilize data in a more efficient way and offer greater insight into the distribution of interest, thus providing additional refinement in inference, for example through the use of higher-order approximations to the distribution of a test statistic—such as empirical Edgeworth expansions (Bickel, Citation1974; Hall, Citation1987; Putter & van Zwet, Citation1998). These methods would require higher-order moment estimation and warrant considerations about estimators’ finite sample properties; since moderate or small sample analysis would benefit from such higher-order approaches, unbiased estimates could prove particularly useful.

Naïve estimators mk=n1i=1n(XiXˉ)k, k=2,3, of central moments μk are biased—that is, E(mk)μk. The first unbiased estimator was introduced for variance by Friedrich Bessel; it is obtained by multiplying m2 by a factor n/(n1), thus often called Bessel’s correction. That estimator is a part of an ordinary t-statistic and therefore plays a role in Student’s t-distribution; it corresponds to the degrees of freedom in chi-squared distribution that arises from i=1n(XiXˉ)2χn12 when X is a standard normal random variable. The corresponding standard deviation estimator, however, is still biased (though the bias is reduced) and underestimates the true parameter. Interest to unbiased moment and cumulant estimation has a long history, which led to theoretical advances and various strategies to be able to obtain higher-order estimators. The work of R.A. Fisher (1928) (Fisher, Citation1930) provided basis for much of this research, particularly on cumulants; for central moments, unbiased estimators up to fourth order (or “weight” in some literature) have been published by Harald Cramér in 1946 (Cramér, Citation2016); the results were later expanded for more complex settings (e.g. including weights (Rimoldini, Citation2014)).

Whereas derivation of unbiased moment estimators in general is straightforward, higher-order calculations involve long algebra and require obtaining nontrivial coefficients, and brute-force calculations of these coefficients become unfeasible fairly quickly. Having observations from different populations or categories, requiring pooled estimators, compounds the problem. Unlike second and third central moments, where naïve biased moment estimators differ from unbiased ones by a constant factor that does not depend on data, subsequent orders require calculation of combinations (integer powers and products) of lower moments that amount to the same order, which in turn creates systems of equations to be solved. With computer algebra, manipulating long algebraic expressions and solving reasonably large systems of linear equations is no longer an issue; the challenge can then be condensed to finding an expectation of the form E(Xj1X2j2X3j3), where Xj=n1i=1nXij, of an arbitrary order and length, written in terms of sample size and true central moments of the distribution of X. Thus, a software algorithm that solves this problem and computer algebra can provide the machinery needed to obtain one-sample and multi-sample pooled estimates of any order, limited only by available processing power.

General order solutions for many problems formulated in the course of unbiased estimation history, including cumulant and moment estimation, are provided in mathStatica (Rose & Smith, Citation2002), an add-on package for the proprietory computer algebra system Mathematica. Still, given many potential uses for such estimates, there is a need for open-source software and easy-to-use tools, accessible to a wide range of researchers, that could be seamlessly incorporated into data analysis. Multi-sample pooled estimation, which has not received much attention in higher-order statistics pursuit (and is not included in mathStatica), can have many practical applications, especially in two-sample settings (e.g. comparing treatment and control groups). In addition, open access to the code and algorithms that are used in generating arbitrary order estimates can be used for obtaining other statistical results, e.g. Edgeworth expansions. We introduce an R package Umoments (Gerlovina & Hubbard, Citation2019), which provides pre-programmed functions that calculate one- and two-sample estimates up to sixth order, either from data or from naïve biased estimates, as well as algorithms and tools for generating general order estimators.

In this paper, we break down the procedure of obtaining unbiased moment estimators of an arbitrary order, as well as estimators of products and powers of moments (also referred to as generalized h-statistics (Tracy & Gupta, Citation1974)) such as μikμjl; an analogous procedure is provided for multi-sample pooled estimators. Additionally, this direct approach is illustrated in the Sage and Jupyter https://github.com/innager/unbiasedMoments templates. Next, we describe the algorithm that generates an expression for expectation of raw (non-central) sample moments and their powers and products, thus automating the challenging part of the derivation. Results section provides a full set of one-sample unbiased estimators up to sixth order (or “weight” in some literature); two-sample pooled estimators can be found in Umoments package but orders four and higher are too long to include in the paper. Results are followed by a quick overview of Umoments package functions; we conclude with a discussion about practical applications of these estimators.

2. Procedure in general

2.1. One-sample estimates

For simplicity, we can consider a mean-zero random variable without any loss of generality. Let Xi,,Xn be a sample of independent identically distributed random variables with E(X)=0 and central moments μk (mean μ1=0, variance μ2), in this case equal to raw moments; Xˉ=1ni=1nXi. We also adopt the following useful notation:

Xj=1ni=1nXij

mk=1ni=1n(XiXˉ)k—naïve biased central moment estimators

M()—unbiased estimator of an expression inside the parentheses (for quantities such as central moments and their powers and products), e.g. EM(μ32)=μ32.

The steps to obtain unbiased estimators of a general order are straightforward:

  1. for a desired order, list all the moment combinations for that order (example provided below);

  2. expand their naïve biased estimators (remove the brackets);

  3. take expectations and represent the results in terms of moments μk and sample size n; this will produce an equation or a system of equations;

  4. solve this equation or a system of equations for true moments.

As an illustration, we go through these steps for M(μ3), an unbiased estimator of a third central moment:

m3=1ni=1n(XiXˉ)3=1ni=1nXi33ni=1nXi2Xˉ+3ni=1nXiXˉ21ni=1nXˉ3,
(1) E(m3)=E(X3)3E(XˉX2)+3E(Xˉ3)E(Xˉ3)=μ33nμ3+2n2μ3,(1)
μ3=n2(n1)(n2)E(m3),
M(μ3)=n2(n1)(n2)m3.

Steps 2 and 4 are trivial and are performed using computer algebra. Calculation of any unbiased moment estimator of a given order involves all the combinations (powers and products of moments) of that order, which means that for fourth and higher orders there will be a system of equations rather than a single equation (recall that μ1=0). For example, estimators of seventh order will include M(μ7), M(μ2μ5), M(μ22μ3) and M(μ3μ4) (Step 1); Step 2 will correspondingly expand m7, m2m5, m22m3 and m3m4 producing four equations. Since the equations need to be solved for a given order’s combinations of true moments, not individual moments, and all the equations in the system are linear in that order, it makes sense to treat these combinations as single variables, thus solving a system of linear equations.

Step 3 is more challenging but the problem can be reduced to finding an expression for EXj1X2j2X3j3 since any term from the right-hand side of Step 2 equations can be written in that form—e.g. 1n5i=1nj=1nk=1nl=1nm=1nXiXj4Xk2Xl2XmXˉ3=Xˉ5X22X4. A general solution to this problem is provided in Umoments package (Gerlovina & Hubbard, Citation2019) that generates expressions for these expectations using combinatorics. This algorithm is explained in detail in Section 3.

2.2. Pooled estimates

A simple extension of the method can be used to obtain unbiased estimators of central moments for samples that contain observations from several populations or categories. We demonstrate the procedure on a two-category estimation, which extends trivially to any number of categories.

For a sample X1,,Xnx,Y1,,Yny, XY, let

mxk=1nxi=1nx(XiXˉ)k,
myk=1nyi=1ny(YiYˉ)k,and
(2) mk=i=1nx(XiXˉ)k+i=1ny(YiYˉ)knx+ny=nxmxk+nymyknx+ny,(2)

where mk is a two-sample analog of the naïve biased estimator described previously. Note that pooled estimation implies an assumption of equality of estimated central moments between distributions of X and Y: μxk=μyk=μk,k=2,3,. Using this assumption, independence of X and Y, and one-sample results from Step 3 in Section 2.1, we extend Step 3 of the roadmap to incorporate two variables and obtain expectations.

Example: obtain two-sample pooled estimate of the third central moment. Using one-sample result (1), we get

E(m3)=nxE(mx3)+nyE(my3)nx+ny=nx2ny+nxny26nxny+2nx+2nynxny(nx+ny)μ3  =12(3nxnynxny)nxny(nx+ny)μ3,

which yields

(3) M(μ3)=nxny(nx+ny)nx2ny+nxny26nxny+2nx+2nym3.(3)

For this particular example, the result matches one-sample case if nx=ny. That is not true in general, however. All the higher orders involve powers and products of lower moments that need to be expanded before taking expectations, affecting the systems of equations. For example,

Em22=Enxmx2+nymy2nx+ny2=nx2E(mx22)+2nxnyE(mx2)E(my2)+ny2E(my22)(nx+ny)2.

3. Generating expressions for expectations

To derive general order expectations of naïve moment estimators and their powers and products, one needs to find expectations EXj1X2j2X3j3. To build up to this, we first describe the procedure for generating EXˉk, which easily extends to EXkX2l and then to the general case that involves an arbitrarily long product of Xiji.

3.1. Generate EXˉk

(4) E(Xˉk)=1nki1=1ni2=2nik=1nE(Xi1Xi2Xik).(4)

To find Equation (4), we need to consider all the different combinations of ordered indices i1,i2,,ik; ij=1,,n for each j. There are nk such combinations but many combinations yield the same E(Xi1Xik)—for example, E(X2X2X2X2X2X5X5X1X1)=E(X4X3X4X6X6X3X6X6X6)=μ22μ5. Combinations that produce the same expectation form a set that we will call a grouping (similar to “partitions” and “augmented symmetric functions” in some terminology), and the problem therefore reduces to considering all the groupings (each producing a distinct expectation) and calculating their coefficients, which are the number of combinations in each set. Each product Xi1Xik can be broken into smaller products, or groups, of X’s with the same indices such as {Xij:ij=c}, c=1,,n. The number of groups ranges between 1 (when all the indices are the same: i1=i2==ik) and k (when all the indices are different: i1i2ik); sizes of these groups determine E(Xi1Xik). Thus, each grouping is fully characterized by the number of groups and the group sizes.

Let d denote the number of groups in one grouping G and a1,,ad—the numbers of X’s in each group, u=1dau=k; set of group sizes is unordered, so assigning indices to a’s is arbitrary (e.g. decreasing). In the example above: k=9, d=3, a1=5, a2=2, and a3=2. If u=1dI(au=1)>0 (at least one group is of size 1), E(Xi1Xik)=0 since E(X)=0 and there is no need to calculate a coefficient for this grouping, which is important in terms of computational efficiency; otherwise, E(Xi1Xik)=u=1dμau. By adding a subscript g to indicate a grouping G, we get

(5) EXˉk=allgCgu=1dgμag,u,(5)

where Cg is the coefficient for G, i.e. the number of combinations that yield {ag,u}.

(6) Cg=(n)dk!ag,1!ag,2!ag,d!sg,1!sg,2!,(6)

where (n)d=n(n1)(nd+1) and sg’s are the numbers of the same sized groups if there are any—e.g. for group sizes a1=a2=5, a3=a4=4, and a5=a6=a7=2, we will get s1=2, s2=2, and s3=3 (from these we can gather that k=24, d=7, and E(X1Xk)=μ52μ42μ23). In this particular setting, Cg/(n)d is analogous to the partition coefficient described in the literature (Dwyer, Citation1938; Fisher, Citation1930). Going back to our original example (group sizes {5,2,2}), there is only one sg: sg,1=2; the coefficient for that example is Cg=n(n1)(n2)9!5!2!2!2!.

One way of arriving at the expression for Cg could be the following: there are (n)dsg,1!sg,2! ways to pick (unordered) indices that satisfy given group sizes (set {ag,u}) and k!ag,1!ag,2!ag,d!=kag,1kag,1ag,2ag,d1+ag,dag,d1 ways to place these indices on k positions (a multinomial coefficient).

Our software generates expressions for EXˉk for a given k using the method described above. To find all the possible groupings, we impose an ordering on them and use it to generate each consecutive grouping when the previous one is given, thus moving through a complete set of groupings from {a1=k} to {a1=a2==ak=1}. For example, in an agglomerative order, a grouping {5,2,2} is preceded by {5,2,1,1} and followed by {5,3,1}.

The smallest number of groups is d=1, which produces an order of nnk=1nk1 (the highest order in the range); the largest d with a non-zero contribution to E(Xˉk) is k2 (when the indices of X appear in pairs and there are no unpaired indices; when k is odd, one of the groups is of size 3), and the order it produces is 1nk2.

3.2. Generate EXˉkX2l

(7) E(XˉkX2l)=1nk+li1=1nik=1nj1=1njl=1nE(Xi1XikXj12Xjl2).(7)

To generate expressions for Equation (7), we extend the algorithm described in Section 3.1 for Equation (4). Now groups consist of X’s and X2’s with the same indices: {Xis,Xjt2:is=jt=c}, c=1,,n, and are thus described not by a single number (group size) but by a pair (a,b), where a is the number of i’s and b is the number of j’s in the group. Consequently, a grouping in this version is characterized by a set of pairs {(au,bu)}, u=1,,d; u=1dau=k, u=1dbu=l, and its definition is different from the one in Section 3.1 since for given k and l there can be different groupings that yield the same expectation, e.g. groupings {(2,3),(3,0),(1,1)}, {(4,2),(1,1),(1,1)}, and {(0,4),(3,0),(3,0)} will all produce E(Xi1Xi6Xj12Xj42)=μ32μ8. Analogously to the original version, if u=1dI(a=1,b=0)>0 (at least one pair in the grouping is (1,0)), E(Xi1XikXj12Xjl2)=0; otherwise, E(Xi1XikXj12Xjl2)=u=1dμau+2bu. Note that to account for all the possible groupings in this case, permutations need to be used, adding another layer to computational complexity.

Coefficient Cg for a grouping G is calculated in a similar way to Section 3.1 (Equation (6)) with a few adjustments:

(8) Cg=(n)dk!l!ag,1!ag,2!ag,d!bg,1!bg,2!bg,d!sg,1!sg,2!,(8)

where sg,1,sg,2, are the numbers of the groups with same values for a and b.

In this case, the order ranges from 1nk+l1, when i1==ik=j1==jk (d=1), to 1nk2, when all indices is appear in pairs if k is even (“extra” index joining one of the groups if k is odd), and all the jt’s are different from is’s and each other (d=k2+1).

3.3. General case

The procedure in Section 3.2 easily generalizes to finding EXˉj1X2j2X3j3Xmjm for an arbitrary m, with groups described by a “tuple” of length m and a grouping being a collection of such tuples. Coefficients Cg for groupings G are calculated similarly to Equation (8), accounting for all the elements in each tuple.

4. Results (up to sixth order)

Below are the results generated with our software (SymPy code that produces these results is in a https://github.com/innager/unbiasedMomentsJupyter notebook):

M(μ3)=n2(n1)(n2)m3,
M(μ4)=3n(2n3)(n1)(n2)(n3)m22+n(n22n+3)(n1)(n2)(n3)m4,
M(μ22)=n(n23n+3)(n1)(n2)(n3)m22n(n2)(n3)m4,
M(μ5)=10n2(n1)(n3)(n4)m2m3+n2(n25n+10)(n1)(n2)(n3)(n4)m5,
M(μ2μ3)=n2n22n+2n1n2n3n4m2m3n2n2n3n4m5
M(μ6)=15n23n10n1n2n3n4n5m2315nn38n2+29n40n1n2n3n4n5m2m4
             40n(n26n+10)(n1)(n2)(n3)(n4)(n5)m32+n(n49n3+31n239n+40)(n1)(n2)(n3)(n4)(n5)m6,
M(μ2μ4)=3n22n5n1n3n4n5m23+nn49n3+53n2135n+120n1n2n3n4n5m2m4
              +4n(n25n+10)(n1)(n3)(n4)(n5)m32n(n23n+8)(n2)(n3)(n4)(n5)m6,
M(μ32)=3n23n215n+20n1n2n3n4n5m23+3n2n35n25n+20n1n2n3n4n5m2m4
           +n(n48n3+25n210n40)(n1)(n2)(n3)(n4)(n5)m32n(n2n+4)(n2)(n3)(n4)(n5)m6,
M(μ23)=n2n27n+15n1n3n4n5m233nn25n+10n1n3n4n5m2m4
          2n(3n215n+20)(n1)(n2)(n3)(n4)(n5)m32+2n(n3)(n4)(n5)m6.

For two-sample pooled estimators up to sixth order, refer to the Umoments package (Gerlovina & Hubbard, Citation2019) and https://github.com/innager/unbiasedMoments Sage worksheet.

5. Umoments R package

Umoments contains a set of pre-programmed functions that calculate one-sample and pooled two-sample unbiased moment estimates, both up to sixth order. This functionality is primarily useful for data analysis. The estimates can be calculated either directly from the sample or from naïve biased estimates, in which case sample size n needs to be provided. For two-sample estimation, input should also include labels indicating which observation belongs to which sample/category, or both nx and ny for sample sizes. Below are some examples.

Two-sample pooled estimates from the data up to sixth order (note that smp is a data vector, and treatment is a vector of labels that separates it into two categories):

> uMpool(smp, treatment, 6)

    M2      M3   M2pow2     M4    M2M3     M5  M2pow3

1.6443027 1.5188515 2.4878505 6.9794503 2.0615514 17.0989234 3.5236856

 M3pow2   M2M4     M6

0.6674177 9.4046220 56.6016025

Unbiased estimate of μ32 from naïve biased second, third, fourth, and sixth moment estimates:

> uM3pow2(m[2], m[3], m[4], m[6], n)

[1] −10.00696

Other functions in the package can be used to obtain higher-order estimators, pooled estimators across multiple (three or more) samples, and other statistical results.

Generate EX3X32X4 for a sample X1,,Xnx (the output is a string that could be used as a code chunk, fed into a computer algebra system, or converted into latex):

> one_combination(c(3, 0, 2, 1), “n_x”)

[1] “(1*n_x*mu13^1 + 3*n_x*(n_x-1)*mu2^1*mu11^1 + 3*n_x*(n_x-1)*(n_x-2)*(n_x-3)*mu5^1*mu3^2*mu2^1 + 6*n_x*(n_x-1)*(n_x-2)*(n_x-3)*mu4^2*mu3^1*mu2^1 + 6*n_x*(n_x-1)*(n_x-2)*mu8^1*mu3^1*mu2^1 + 9*n_x*(n_x-1)*(n_x-2)*mu7^1*mu4^1*mu2^1 + 3*n_x*(n_x-1)*(n_x-2)*mu6^1*mu5^1*mu2^1 + 3*n_x*(n_x-1)*mu3^1*mu10^1 + 1*n_x*(n_x-1)*(n_x-2)*(n_x-3)*mu4^1*mu3^3 + 3*n_x*(n_x-1)*(n_x-2)*mu7^1*mu3^2 + 9*n_x*(n_x-1)*(n_x-2)*mu6^1*mu4^1*mu3^1 + 6*n_x*(n_x-1)*(n_x-2)*mu5^2*mu3^1 + 12*n_x*(n_x-1)*(n_x-2)*mu5^1*mu4^2 + 7*n_x*(n_x-1)*mu9^1*mu4^1 + 9*n_x*(n_x-1)* mu8^1*mu5^1 + 6*n_x*(n_x-1)*mu7^1*mu6^1)/n_x^6”

Generate groupings for k=5 (see Section 3.1):

> Umoments:::groups(5)

[[1]]

[1] 1 1 1 1 1

[[2]]

[1] 2 1 1 1

[[3]]

[1] 2 2 1

[[4]]

[1] 3 1 1

[[5]]

[1] 3 2

[[6]]

[1] 4 1

[[7]]

[1] 5

For further details and examples, refer to package vignette and documentation (Gerlovina & Hubbard, Citation2019).

6. Discussion

The difference between unbiased and biased estimators depends on the sample size and might be considerable for small samples; also, for fixed sample size, it is relatively greater for higher orders. At the same time, variability of the estimators is an important factor to be considered in this bias-variance trade-off, especially in connection with sample size n and the order of the estimators as variability increases with higher orders (which might be offset by the lower contribution/weight of these orders in certain methods) and smaller samples. Another question is the relationship between n and the maximal order that could reasonably be used in a method; besides purely algebraic restrictions on a sample size for a given order, apparent from the expressions for unbiased estimators (nk for k’th order estimators), there might be another underlying stricter relationship that needs to be explored, either theoretically or numerically.

While unbiased estimators of products and integer powers of moments are possible to obtain, that is not the case with ratios and roots. Of course, such biased estimators, like a square root s of sample variance s2=1n1i=1n(XiXˉ)2 or skewness estimator, are widely used in practice. Adding to the complexity is the fact that since unbiased estimator of the ratio cannot be obtained, simplifying expressions should also be questioned—consider, for example, scaled sixth cumulant:

λ6=κ6μ23=μ615μ2μ410μ32+30μ23μ23=μ6μ2315μ4μ2210μ32μ23+30.

For a closest estimate, it is natural to consider the ratio of an unbiased cumulant estimator M(κ6) and an unbiased scaling factor M(μ23). Then, is M(μ2μ4)M(μ23) preferable to M(μ4)M(μ22) for the second term?

This example also provides an illustration for another important consideration that should factor into a decision of which estimators to use—variability of the denominator in studentized statistics. In λ6, sixth cumulant κ6 is scaled by μ23; to substitute for this unknown quantity, a variety of estimators can be used: m23, M(μ2)3, or M(μ23), to name a few. While expression for M(μ2) (and thus its cube) contains m2 only, the expression for M(μ23) includes m4 and m6 as well. These higher-order quantities may be highly variable, especially in the small sample, and therefore the whole ratio becomes highly sensitive to the small values of estimates in the denominator that can inflate λ6 dramatically, increasing variability of the ratio to the point of unusability. Therefore, it might be advisable in certain cases, e.g. with considerably skewed distributions, to perform some numeric exploration to determine if it might be indeed preferable to use lower-order estimators, naïve biased or unbiased, in place of parameters in denominators because of their relative stability (“power of mean” instead of “mean of power”).

Correction

This article has been republished with minor changes. These changes do not impact the academic content of the article

Additional information

Funding

This work was supported by NIH under Grant [P42ES004705].

Notes on contributors

Inna Gerlovina

Inna Gerlovina is a postdoctoral scholar at the University of California, San Francisco. She has worked on small sample inference and error rate control as well as higher-order inferential approaches, developing software packages for high-dimensional data analysis. Her current interests include development, implementation, and application of statistical methods that contribute to the understanding of malaria epidemiology and transmission. Inna completed her MA and PhD in Biostatistics at the University of California, Berkeley.

Alan E. Hubbard

Alan Hubbard, Professor of Biostatistics, University of California, Berkeley, is a director of the Biomedical Big Data pre-doctoral training program, and co-director of the Center of Targeted Learning, is head of the computational biology Core E of the SuperFund Center at UC Berkeley (NIH/EPA), as well a consulting statistician on several federally funded and foundation projects. He has worked as well on projects ranging from molecular biology of aging, epidemiology, and infectious disease modeling, but most all of his work has focused on semi-parametric estimation in high-dimensional data. His current methods-research focuses on precision medicine, variable importance, statistical inference for data-adaptive parameters, and statistical software implementing targeted learning methods. He is currently working in several areas of applied research, including early childhood development in developing countries, patient outcomes from acute trauma, environmental genomics and comparative effectiveness research in diabetes care.

References

  • Bickel, P. (1974). Edgeworth expansions in nonparametric statistics. The Annals of Statistics, 2, 1–11. doi:10.1214/aos/1176342609
  • Cramér, H. (2016). Mathematical methods of statistics (pms-9) (Vol. 9). Princeton, NJ: Princeton university press.
  • Dwyer, P. S. (1938). Combined expansions of products of symmetric power sums and of sums of symmetric power products with application to sampling. The Annals of Mathematical Statistics, 9(1), 1–47. doi:10.1214/aoms/1177732357
  • Fisher, R. A. (1930). Moments and product moments of sampling distributions. Proceedings of the London Mathematical Society, 2(1), 199–238. doi:10.1112/plms/s2-30.1.199
  • Gerlovina, I., & Hubbard, A. E. (2019). Umoments: Unbiased central moment estimates. R package version 0.1.0. Retrieved from https://CRAN.R-project.org/package=Umoments
  • Hall, P. (1987). Edgeworth expansion for student’s t statistic under minimal moment conditions. The Annals of Probability, 15(3), 920–931. doi:10.1214/aop/1176992073
  • Pebay, P. P., Terriberry, T., Kolla, H., Bennett, J. C. (2014). Formulas for the computation of higher-order central moments. Livermore, CA, USA: Sandia National Lab, (SNL-CA).
  • Putter, H., van Zwet, W. R. (1998). Empirical edgeworth expansions for symmetric statistics. The Annals of Statistics, 26(4), 1540–1569. doi:10.1214/aos/1024691253
  • Rimoldini, L. (2014). Weighted skewness and kurtosis unbiased by sample size and Gaussian uncertainties. Astronomy and Computing, 5, 1–8. doi:10.1016/j.ascom.2014.02.001
  • Rose, C., & Smith, M. (2002). Mathematical statistics with mathematica, chapter 7: Moments of sampling distributions. New York: Springer-Verlag.
  • Tracy, D., Gupta, B. (1974). Generalized h-statistics and other symmetric functions. The Annals of Statistics, 2(4), 837–844. doi:10.1214/aos/1176342774