Abstract
The number of extant individuals within a lineage, as exemplified by counts of species numbers across genera in a higher taxonomic category, is known to be a highly skewed distribution. Because the sublineages (such as genera in a clade) themselves follow a random birth process, deriving the distribution of lineage sizes involves averaging the solutions to a birth and death process over the distribution of time intervals separating the origin of the lineages. In this article, we show that the resulting distributions can be represented by hypergeometric functions of the second kind. We also provide approximations of these distributions up to the second order, and compare these results to the asymptotic distributions and numerical approximations used in previous studies. For two limiting cases, one with a relatively high rate of lineage origin, one with a low rate, the cumulative probability densities and percentiles are compared to show that the approximations are robust over a wide range of parameters. It is proposed that the probability distributions of lineage size may have a number of relevant applications to biological problems such as the coalescence of genetic lineages and in predicting the number of species in living and extinct higher taxa, as these systems are special instances of the underlying process analyzed in this article.
Mathematics Subject Classification:
Acknowledgments
The authors thank an anonymous reviewer for comments and suggested corrections. Panagis Moschopoulos was supported by RCMI/NIH grant 5G12 RR008124 from the National Institutes of Health to the Border Biomedical Research Center (BBRC) at the University of Texas at El Paso (UTEP), and Max Shpak was supported by funding for research from the University of Texas at El Paso.