Full article: The iteratile: A new measure of central tendency

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

This article defines a new measure of central tendency (called the “iteratile”) by iterating a function which maps a triple of data into a triple of the median, mean, and trimean (sorted). An explicit formula is given with proof, along with brief discussion of potential applications.

Keywords:

Public Interest Statement

Often, small amounts of data and outliers make it difficult to do statistics. To measure the “typical” behavior, we usually use means or medians, but sometimes even those have problems. We offer an alternative way of measuring the center here.

The ideas in DiMarco and Savitz (Citation2013,Citation2012) take N values of data and study all their m-tiles, which separate the data in m sections, for $2 \leq m \leq N + 1$ (where the 2-tile is the median, the 4-tiles are quartiles, and the ( $N + 1$ )-tile (the sample mean)). Observing that N values of data have N-many m-tiles, the author wondered if mapping all the data to their m-tiles and iterating would have a limit, and if so, what that limit would be.

In the case N = 3, we can get a formula. More precisely, given data (a, b, c), for $a \leq b \leq c$ , define $f^{*} (a, b, c) = (b, \frac{a + 2 b + c}{4}, \frac{a + b + c}{3})$ , and then call f(a, b, c) the sorted version of $f^{*}$ , where the components are in increasing order. Denote f composed with itself n times by $f^{(n)}$ .

Theorem 1

${lim}_{n \to \infty} f^{(n)} (a, b, c)$ exists and is of the form (I,I,I), where $I = \frac{3 a + 8 b + 3 c}{14}$ . (We call the limit the “iteratile” in what follows.)

Proof

If $a = b = c$ , we are done, so assume $a < b < c$ .

Note first that the limit exists; f is a contraction mapping since each of $b, \frac{a + 2 b + c}{4}$ , and $\frac{a + b + c}{3}$ must be greater than a and less than c.

We find the general formula with the aid of two lemmas. The first allows us to focus only on the behavior of the middle term, and the second shows by induction how the middle term exhibits geometric series behavior.

Lemma 1

f(a, b, c) takes either the form: $(b, \frac{a + 2 b + c}{4}, \frac{a + b + c}{3})$ or $(\frac{a + b + c}{3}, \frac{a + 2 b + c}{4}, b)$ .

Proof

By cases: Case 1: $b - a = c - b$ .

The assumption is equivalent to $2 b = a + c$ , so $\frac{a + 2 b + c}{4} = \frac{4 b}{4} = b$ , and a similar argument yields $\frac{a + b + c}{3} = b$ . Thus, $f (a, b, c) = (b, b, b)$ , so $I = b$ .

Case 2: $b - a > c - b$ .

The assumption is equivalent to $2 b > a + c$ . Thus, $\frac{a + 2 b + c}{4} < \frac{2 b + 2 b}{4} = b$ and $\frac{a + b + c}{3} < \frac{3 b}{b} = b$ . Further, $\frac{a + b + c}{3} - \frac{a + 2 b + c}{4}$ = $\frac{a - 2 b + c}{12}$ which is negative by assumption. Thus, $b > \frac{a + 2 b + c}{4} > \frac{a + b + c}{3} .$

Case 3: $b - a < c - b$ . Similar arguments to Case 2 yield: $b < \frac{a + 2 b + c}{4} < \frac{a + b + c}{3} .$ $□$

Thus, it remains to study the difference from one iteration to the next in the middle term. Now, call $f^{(n)} (a, b, c) = (a_{n}, b_{n}, c_{n})$ for convenience (so $a_{0} = a$ , etc.). We now give a formula for $b_{n + 1} - b_{n}$ which will quickly complete the proof.

Lemma 2

For any $n \in N$ , $b_{n + 1} - b_{n} = \frac{a + c - 2 b}{4 {(- 6)}^{n}}$ .

Proof

By induction, we see that for $n = 0$ : $b_{1} - b_{0} = \frac{a + 2 b + c}{4} - b = \frac{1}{4} (- a - c + 2 b) = \frac{1}{4} \frac{a + c - 2 b}{{(- 6)}^{0}} .$

For $n = 1$ , first observe by Lemma 1 that(1) $\begin{matrix} b_{2} = \frac{1}{4} \frac{5 a + 14 b + 5 c}{6}, \end{matrix}$ (1)

since (without loss of generality) by Lemma 1, $b_{2} = \frac{a_{1} + 2 b_{1} + c_{1}}{4} = \frac{1}{4} [\frac{a + b + c}{3} + \frac{a + 2 b - c}{4} + b] = \frac{1}{4} \frac{5 a + 14 b + 5 c}{6} .$

So,(2) $\begin{matrix} b_{2} - b_{1} = \frac{1}{4} \frac{5 a + 14 b + 5 c}{6} - \frac{a + 2 b + c}{4} = \frac{1}{4} \frac{a + c - 2 b}{{(- 6)}^{1}} . \end{matrix}$ (2)

Now, suppose that $b_{n + 1} - b_{n} = \frac{a + c - 2 b}{4 {(- 6)}^{n}}$ for an arbitrary $n \in N$ . We want to show that it holds for $n + 1$ , so we study $b_{n + 2} - b_{n + 1}$ .

An argument similar to (1) gives the relation:(3) $\begin{matrix} b_{n + 2} - b_{n + 1} = \frac{1}{4} \frac{a_{n} + c_{n} - 2 b_{n}}{(- 6)} . \end{matrix}$ (3)

So, Lemma 1 again gives(4) $\begin{matrix} b_{n + 2} - b_{n + 1} = \frac{1}{4} [a_{n + 1} - a_{n} + 2 (b_{n + 1} - b_{n}) + c_{n + 1} - c_{n}], \end{matrix}$ (4)

and it seems we need to study $a_{n + 1} - a_{n}$ and $c_{n + 1} - c_{n}$ to continue.

Again, Lemma 1 tells us that $a_{n + 1} - a_{n}$ will either be of the form $b_{n} - a_{n}$ or $\frac{a_{n} + b_{n} + c_{n}}{3} - a_{n}$ , and similarly for $c_{n + 1} - c_{n}$ . Luckily, we need to only consider the sum of $a_{n + 1} - a_{n}$ and $c_{n + 1} - c_{n}$ , so without loss of generality we know,(5) $\begin{matrix} a_{n + 1} - a_{n} + c_{n + 1} - c_{n} = \frac{a_{n} + b_{n} + c_{n}}{3} + b_{n} - a_{n} - c_{n} = \frac{1}{6} [8 b_{n} - 4 a_{n} - 4 c_{n}] . \end{matrix}$ (5)

Now we have what we need to prove the claim. By (4), $b_{n + 2} - b_{n + 1} = \frac{1}{4} [a_{n + 1} - a_{n} + 2 (b_{n + 1} - b_{n}) + c_{n + 1} - c_{n}],$

and by (5) and induction hypothesis, $\begin{matrix} = \frac{1}{4} [\frac{1}{6} (8 b_{n} - 4 a_{n} - 4 c_{n}) + 2 (\frac{1}{4} \frac{a + c - 2 b}{{(- 6)}^{n}})] \\ = [\frac{1}{6} (2 b_{n} - a_{n} - c_{n}) + 2 (\frac{1}{4} \frac{a + c - 2 b}{{(- 6)}^{n}})] . \end{matrix}$

Using (3), we get the equation $b_{n + 2} - b_{n + 1} = 4 (b_{n + 2} - b_{n + 1}) + \frac{1}{2} \frac{a + c - 2 b}{{(- 6)}^{n}} .$

We can write this as: $- 3 (b_{n + 2} - b_{n + 1}) = \frac{1}{2} \frac{a + c - 2 b}{{(- 6)}^{n}},$

which reduces to $b_{n + 2} - b_{n + 1} = \frac{a + c - 2 b}{{(- 6)}^{n + 1}},$

and this proves the claim.

The upshot of Lemma 2 is that each subsequent iteration “steals $\frac{1}{6}$ from a (and from c) and gives it to b,” which means we can set up the equation $\sum_{n = 0}^{\infty} b_{n + 1} - b_{n} = \sum_{n = 0}^{\infty} \frac{a + c - 2 b}{4 {(- 6)}^{n}} .$

The left-hand side is a telescoping series, and we know that ${lim}_{n \to \infty} b_{n} = I$ . Also, the right-hand side is a simple geometric series, thus the above becomes $I - b = \frac{3}{14} (a + c - 2 b),$

or $I = \frac{1}{14} (3 a + 8 b + 3 c),$

as claimed. $□$

Can this be extended to higher n? Unfortunately, there does not exist a universally agreed-upon notion of “percentile”; different definitions will yield different iteratiles. Thus, without specifying exactly which notion is desired, one cannot expect a general result. It seems that one gets a “symmetric convex combination” in any case.

How “good” or “useful” is the iteratile? One potential use is when there are errors in the data, so the data-set changes. What measure of central tendency minimizes the difference? Results along these lines are discussed in DiMarco, Hollingsworth, and Savitz (Citation2015) in the context of z-scores and normal distributions.

For example, if the data-set (0,0,24) were changed to (0,3,16), then the difference in the iteratiles would be zero. But of course, one is not allowed to see the data change before selecting the appropriate measure. How does one pick a central tendency measure a priori to minimize the difference in measure for an arbitrary data change? There is a general trend: when data in the middle is (expected to be) altered, use the mean, and when data at the extremes are altered, use the median. When there is an “inbetween” for what to use? At least, relative to the trimean, the iteratile should be used if values toward the extreme are altered more, and the trimean when values toward the middle are altered more. Also, it seems that skewed alterations suggest use of iteratile.

In general, it is quite crude to have to resort to the median, as so much information is unaccounted for. The iteratile gives a bit more of that information and reduces the effects of outliers. As one piece of data gets arbitrarily large, the non-median measures will go to infinity, but at least, with different speeds (mean: 1/3, trimean: 1/4, iteratile: 3/14). As a result, perhaps using the iteratile is a decent alternative to the median. But still, is there anything special about the iteratile? One could, for example, merely make a measure of central tendency like $\frac{3 a + 10 b + 3 c}{16}$ , which would do even better ... but would it be optimal? Is there really something coming from the limit? These are complicated, longstanding, and potentially unanswerable questions, and even if the iteratile itself turns out to be inferior, perhaps the idea of iteratile may lead to other more useful alternatives and generalizations. At the very least, those who appreciate calculus may simply enjoy the result at a purely esthetic level.

Additional information

Funding

The author received no direct funding for this reserach.

Notes on contributors

Blane Hollingsworth

Blane Hollingsworth received his PhD from Auburn University in stochastic differential equations, and is currently a visiting scholar at Indiana University. Currently, he is studying alternatives to means and medians as measures of central tendency, especially for small sets of data with outliers/errors.

References

DiMarco, D., & Savitz, R. (2012). The M-tile means, A new class of measures of central tendency: Theory and applications. JIMS, 12, 48–56.
Google Scholar
DiMarco, D., & Savitz, R. (2013). The M-tile deviation: A new class of measures of dispersion. International Journal of Business Research, 13, 117–124.
Google Scholar
DiMarco, D., Hollingsworth, B., & Savitz, R. (2015). On resistant versions of the standard score. European Journal of Marketing, 15, 7–16.
Google Scholar

The iteratile: A new measure of central tendency

Abstract

Public Interest Statement

Notes on contributors

Blane Hollingsworth

References

Information for

Open access

Opportunities

Help and information

The iteratile: A new measure of central tendency

Abstract

Public Interest Statement

Additional information

Funding

Notes on contributors

Blane Hollingsworth

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date