1,086
Views
0
CrossRef citations to date
0
Altmetric
Research Article

The iteratile: A new measure of central tendency

| (Reviewing Editor)
Article: 1149135 | Received 27 Oct 2015, Accepted 26 Jan 2016, Published online: 26 Feb 2016

Abstract

This article defines a new measure of central tendency (called the “iteratile”) by iterating a function which maps a triple of data into a triple of the median, mean, and trimean (sorted). An explicit formula is given with proof, along with brief discussion of potential applications.

Public Interest Statement

Often, small amounts of data and outliers make it difficult to do statistics. To measure the “typical” behavior, we usually use means or medians, but sometimes even those have problems. We offer an alternative way of measuring the center here.

The ideas in DiMarco and Savitz (Citation2013,Citation2012) take N values of data and study all their m-tiles, which separate the data in m sections, for 2mN+1 (where the 2-tile is the median, the 4-tiles are quartiles, and the (N+1)-tile (the sample mean)). Observing that N values of data have N-many m-tiles, the author wondered if mapping all the data to their m-tiles and iterating would have a limit, and if so, what that limit would be.

In the case N = 3, we can get a formula. More precisely, given data (abc), for abc, define f(a,b,c)=(b,a+2b+c4,a+b+c3), and then call f(abc) the sorted version of f, where the components are in increasing order. Denote f composed with itself n times by f(n).

Theorem 1

limnf(n)(a,b,c) exists and is of the form (I,I,I), where I=3a+8b+3c14. (We call the limit the “iteratile” in what follows.)

Proof

If a=b=c, we are done, so assume a<b<c.

Note first that the limit exists; f is a contraction mapping since each of b,a+2b+c4, and a+b+c3 must be greater than a and less than c.

We find the general formula with the aid of two lemmas. The first allows us to focus only on the behavior of the middle term, and the second shows by induction how the middle term exhibits geometric series behavior.

Lemma 1

f(abc) takes either the form: (b,a+2b+c4,a+b+c3) or (a+b+c3,a+2b+c4,b).

Proof

By cases: Case 1: b-a=c-b.

The assumption is equivalent to 2b=a+c, so a+2b+c4=4b4=b, and a similar argument yields a+b+c3=b. Thus, f(a,b,c)=(b,b,b), so I=b.

Case 2: b-a>c-b.

The assumption is equivalent to 2b>a+c. Thus, a+2b+c4<2b+2b4=b and a+b+c3<3bb=b. Further, a+b+c3-a+2b+c4 = a-2b+c12 which is negative by assumption. Thus,b>a+2b+c4>a+b+c3.

Case 3: b-a<c-b. Similar arguments to Case 2 yield:b<a+2b+c4<a+b+c3.

Thus, it remains to study the difference from one iteration to the next in the middle term. Now, call f(n)(a,b,c)=(an,bn,cn) for convenience (so a0=a, etc.). We now give a formula for bn+1-bn which will quickly complete the proof.

Lemma 2

For any nN, bn+1-bn=a+c-2b4(-6)n.

Proof

By induction, we see that for n=0:b1-b0=a+2b+c4-b=14(-a-c+2b)=14a+c-2b(-6)0.

For n=1, first observe by Lemma 1 that(1) b2=145a+14b+5c6,(1)

since (without loss of generality) by Lemma 1,b2=a1+2b1+c14=14[a+b+c3+a+2b-c4+b]=145a+14b+5c6.

So,(2) b2-b1=145a+14b+5c6-a+2b+c4=14a+c-2b(-6)1.(2)

Now, suppose that bn+1-bn=a+c-2b4(-6)n for an arbitrary nN. We want to show that it holds for n+1, so we study bn+2-bn+1.

An argument similar to (1) gives the relation:(3) bn+2-bn+1=14an+cn-2bn(-6).(3)

So, Lemma 1 again gives(4) bn+2-bn+1=14[an+1-an+2(bn+1-bn)+cn+1-cn],(4)

and it seems we need to study an+1-an and cn+1-cn to continue.

Again, Lemma 1 tells us that an+1-an will either be of the form bn-an or an+bn+cn3-an, and similarly for cn+1-cn. Luckily, we need to only consider the sum of an+1-an and cn+1-cn, so without loss of generality we know,(5) an+1-an+cn+1-cn=an+bn+cn3+bn-an-cn=16[8bn-4an-4cn].(5)

Now we have what we need to prove the claim. By (4),bn+2-bn+1=14[an+1-an+2(bn+1-bn)+cn+1-cn],

and by (5) and induction hypothesis,=1416(8bn-4an-4cn)+214a+c-2b(-6)n=16(2bn-an-cn)+214a+c-2b(-6)n.

Using (3), we get the equationbn+2-bn+1=4(bn+2-bn+1)+12a+c-2b(-6)n.

We can write this as:-3(bn+2-bn+1)=12a+c-2b(-6)n,

which reduces tobn+2-bn+1=a+c-2b(-6)n+1,

and this proves the claim.

The upshot of Lemma 2 is that each subsequent iteration “steals 16 from a (and from c) and gives it to b,” which means we can set up the equationn=0bn+1-bn=n=0a+c-2b4(-6)n.

The left-hand side is a telescoping series, and we know that limnbn=I. Also, the right-hand side is a simple geometric series, thus the above becomesI-b=314(a+c-2b),

orI=114(3a+8b+3c),

as claimed.

Can this be extended to higher n? Unfortunately, there does not exist a universally agreed-upon notion of “percentile”; different definitions will yield different iteratiles. Thus, without specifying exactly which notion is desired, one cannot expect a general result. It seems that one gets a “symmetric convex combination” in any case.

How “good” or “useful” is the iteratile? One potential use is when there are errors in the data, so the data-set changes. What measure of central tendency minimizes the difference? Results along these lines are discussed in DiMarco, Hollingsworth, and Savitz (Citation2015) in the context of z-scores and normal distributions.

For example, if the data-set (0,0,24) were changed to (0,3,16), then the difference in the iteratiles would be zero. But of course, one is not allowed to see the data change before selecting the appropriate measure. How does one pick a central tendency measure a priori to minimize the difference in measure for an arbitrary data change? There is a general trend: when data in the middle is (expected to be) altered, use the mean, and when data at the extremes are altered, use the median. When there is an “inbetween” for what to use? At least, relative to the trimean, the iteratile should be used if values toward the extreme are altered more, and the trimean when values toward the middle are altered more. Also, it seems that skewed alterations suggest use of iteratile.

In general, it is quite crude to have to resort to the median, as so much information is unaccounted for. The iteratile gives a bit more of that information and reduces the effects of outliers. As one piece of data gets arbitrarily large, the non-median measures will go to infinity, but at least, with different speeds (mean: 1/3, trimean: 1/4, iteratile: 3/14). As a result, perhaps using the iteratile is a decent alternative to the median. But still, is there anything special about the iteratile? One could, for example, merely make a measure of central tendency like 3a+10b+3c16, which would do even better ... but would it be optimal? Is there really something coming from the limit? These are complicated, longstanding, and potentially unanswerable questions, and even if the iteratile itself turns out to be inferior, perhaps the idea of iteratile may lead to other more useful alternatives and generalizations. At the very least, those who appreciate calculus may simply enjoy the result at a purely esthetic level.

Additional information

Funding

The author received no direct funding for this reserach.

Notes on contributors

Blane Hollingsworth

Blane Hollingsworth received his PhD from Auburn University in stochastic differential equations, and is currently a visiting scholar at Indiana University. Currently, he is studying alternatives to means and medians as measures of central tendency, especially for small sets of data with outliers/errors.

References

  • DiMarco, D., & Savitz, R. (2012). The M-tile means, A new class of measures of central tendency: Theory and applications. JIMS, 12, 48–56.
  • DiMarco, D., & Savitz, R. (2013). The M-tile deviation: A new class of measures of dispersion. International Journal of Business Research, 13, 117–124.
  • DiMarco, D., Hollingsworth, B., & Savitz, R. (2015). On resistant versions of the standard score. European Journal of Marketing, 15, 7–16.