1,339
Views
4
CrossRef citations to date
0
Altmetric
Review Article

A review of compositional data analysis and recent advances

ORCID Icon
Pages 5535-5567 | Received 01 Jun 2020, Accepted 29 Nov 2021, Published online: 16 Dec 2021

References

  • Aitchison, J. 1982. The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B 44 (2):139–77.
  • Aitchison, J. 1983. Principal component analysis of compositional data. Biometrika 70 (1):57–65. doi:10.1093/biomet/70.1.57.
  • Aitchison, J. 1985. A general class of distributions on the simplex. Journal of the Royal Statistical Society: Series B 47 (1):136–46.
  • Aitchison, J. 1989. Measures of location of compositional data sets. Mathematical Geology 21 (7):787–90. doi:10.1007/BF00893322.
  • Aitchison, J. 1992. On criteria for measure of compositional difference. Mathematical Geology 24 (4):365–79. doi:10.1007/BF00891269.
  • Aitchison, J. 2003. The statistical analysis of compositional data. Caldwell, NJ: Reprinted by The Blackburn Press.
  • Aitchison, J., and J. Bacon-Shone. 1984. Log contrast models for experiments with mixtures. Biometrika 71 (2):323–30. doi:10.1093/biomet/71.2.323.
  • Aitchison, J., and M. Greenacre. 2002. Biplots of compositional data. Journal of the Royal Statistical Society: Series C 51 (4):375–92.
  • Aitchison, J., and I. Lauder. 1985. Kernel density estimation for compositional data. Journal of the Royal Statistical Society: Series C 34 (2):129–37.
  • Alenazi, A. 2019. Regression for compositional data with compositional data as predictor variables with or without zero values. Journal of Data Science 17 (1):219–37. doi:10.6339/JDS.201901_17(1).0010.
  • Ankam, D., and N. Bouguila. 2018. Compositional data analysis with PLS-DA and security applications. In 2018 IEEE International Conference on Information Reuse and Integration (IRI), 338–45.
  • Atchison, J., and S. Shen. 1980. Logistic-normal distributions: Some properties and uses. Biometrika 67 (2):261–72. doi:10.1093/biomet/67.2.261.
  • Atkinson, A. 1985. Plots, transformations, and regression: An introduction to graphical methods of diagnostic regression analysis. Oxford: Oxford University Press.
  • Azzalini, A., and A. Capitanio. 1999. Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61 (3):579–602. doi:10.1111/1467-9868.00194.
  • Azzalini, A., and A. Valle. 1996. The multivariate skew-normal distribution. Biometrika 83 (4):715–26. doi:10.1093/biomet/83.4.715.
  • Barceló, C., V. Pawlowsky, and E. Grunsky. 1996. Some aspects of transformations of compositional data and the identification of outliers. Mathematical Geology 28 (4):501–18. doi:10.1007/BF02083658.
  • Barceló-Vidal, C., J. Martin-Fernández, and V. Pawlowsky-Glahn. 2001. Mathematical foundations of compositional data analysis. In Proceedings of International Association for Mathematical Geology, Cancun, Mexico, volume 1.
  • Baxter, M. 1995. Standardization and transformation in principal component analysis, with applications to archaeometry. Applied Statistics 44 (4):513–27. doi:10.2307/2986142.
  • Baxter, M. 2001. Statistical modelling of artefact compositional data. Archaeometry 43 (1):131–47. doi:10.1111/1475-4754.00008.
  • Baxter, M., C. Beardah, H. Cool, and C. Jackson. 2005. Compositional data analysis of some alkaline glasses. Mathematical Geology 37 (2):183–96. doi:10.1007/s11004-005-1308-3.
  • Baxter, M., H. Cool, and M. Heyworth. 1990. Principal component and correspondence analysis of compositional data: Some similarities. Journal of Applied Statistics 17 (2):229–35. doi:10.1080/757582834.
  • Baxter, M., and I. Freestone. 2006. Log-ratio compositional data analysis in archaeometry. Archaeometry 48 (3):511–31. doi:10.1111/j.1475-4754.2006.00270.x.
  • Baxter, M., and C. Jackson. 2001. Variable selection in artefact compositional studies. Archaeometry 43 (2):253–68. doi:10.1111/1475-4754.00017.
  • Bear, J., and D. Billheimer. 2016. A logistic normal mixture model for compositional data allowing essential zeros. Austrian Journal of Statistics 45 (4):3–23. doi:10.17713/ajs.v45i4.117.
  • Brandt, P. T., B. L. Monroe, and J. T. Williams. 1999. Time series models for compositional data. Unpublished paper. Department of Political Science, Indiana University, Bloomington.
  • Bruno, F., F. Greco, and M. Ventrucci. 2016. Non-parametric regression on compositional covariates using Bayesian P-splines. Statistical Methods & Applications 25 (1):75–88. doi:10.1007/s10260-015-0339-2.
  • Brunsdon, T. M., and T. Smith. 1998. The time series analysis of compositional data. Journal of Official Statistics 14 (3):237.
  • Buccianti, A., G. Mateu-Figueras, and V. Pawlowsky-Glahn. 2006. Compositional data analysis in the geosciences: From theory to practice. London: Geological Society of London.
  • Butler, A., and C. Glasbey. 2008. A latent gaussian model for compositional data with zeros. Journal of the Royal Statistical Society: Series C 57 (5):505–20.
  • Cannings, C., and A. Edwards. 1968. Natural selection and the de Finetti diagram. Annals of Human Genetics 31 (4):421–8. doi:10.1111/j.1469-1809.1968.tb00575.x.
  • Cao, Y., W. Lin, and H. Li. 2018. Two-sample tests of high-dimensional means for compositional data. Biometrika 105 (1):115–32. doi:10.1093/biomet/asx060.
  • Chacón, J., G. Mateu-Figueras, and J. Martín-Fernández. 2011. Gaussian kernels for density estimation with compositional data. Computers & Geosciences 37 (5):702–11. doi:10.1016/j.cageo.2009.12.011.
  • Combettes, P. L., and C. L. Müller. 2021. Regression models for compositional data: General log-contrast formulations, proximal optimization, and microbiome data applications. Statistics in Biosciences 13 (2):217–42. doi:10.1007/s12561-020-09283-2.
  • Connor, R. J., and J. E. Mosimann. 1969. Concepts of independence for proportions with a generalization of the Dirichlet distribution. Journal of the American Statistical Association 64 (325):194–206. doi:10.1080/01621459.1969.10500963.
  • Cuesta-Albertos, J. A., A. Cuevas, and R. Fraiman. 2009. On projection-based tests for directional and compositional data. Statistics and Computing 19 (4):367–80. doi:10.1007/s11222-008-9098-3.
  • Di Marzio, M., A. Panzera, and C. Venieri. 2015. Non-parametric regression for compositional data. Statistical Modelling 15 (2):113–33. doi:10.1177/1471082X14535522.
  • Di Palma, M. A., P. Filzmoser, M. Gallo, and K. Hron. 2018. A robust parafac model for compositional data. Journal of Applied Statistics 45 (8):1347–69. doi:10.1080/02664763.2017.1381669.
  • Dobigeon, N., and J. Tourneret. 2007. Truncated multivariate Gaussian distribution on a simplex. Technical report, Univ. Toulouse, France.
  • Egozcue, J. J., and V. Pawlowsky-Glahn. 2005. Groups of parts and their balances in compositional data analysis. Mathematical Geology 37 (7):795–828. doi:10.1007/s11004-005-7381-9.
  • Egozcue, J., V. Pawlowsky-Glahn, G. Mateu-Figueras, and C. Barceló-Vidal. 2003. Isometric logratio transformations for compositional data analysis. Mathematical Geology 35 (3):279–300. doi:10.1023/A:1023818214614.
  • Endres, D. M., and J. E. Schindelin. 2003. A new metric for probability distributions. IEEE Transactions on Information Theory 49 (7):1858–60.
  • Erb, I. 2020. Partial correlations in compositional data analysis. Applied Computing and Geosciences 6:100026. doi:10.1016/j.acags.2020.100026.
  • Erb, I., and C. Notredame. 2016. How should we measure proportionality on relative gene expression data? Theory in Biosciences = Theorie in Den Biowissenschaften 135 (1-2):21–36. doi:10.1007/s12064-015-0220-8.
  • Fiksel, J., S. Zeger, and A. Datta. 2021. A transformation-free linear regression for compositional outcomes and predictors. Biometrics. doi:10.1111/biom.13465.
  • Filzmoser, P., K. Hron, and C. Reimann. 2009a. Principal component analysis for compositional data with outliers. Environmetrics 20 (6):621–32. doi:10.1002/env.966.
  • Filzmoser, P., K. Hron, C. Reimann, and R. Garrett. 2009b. Robust factor analysis for compositional data. Computers & Geosciences 35 (9):1854–61. doi:10.1016/j.cageo.2008.12.005.
  • Filzmoser, P., K. Hron, and M. Templ. 2012. Discriminant analysis for compositional data and robust parameter estimation. Computational Statistics 27 (4):585–604. doi:10.1007/s00180-011-0279-8.
  • Filzmoser, P., K. Hron, and M. Templ. 2018. Applied compositional data analysis. Cham, Switzerland: Springer.
  • Gallo, M. 2010. Discriminant partial least squares analysis on compositional data. Statistical Modelling 10 (1):41–56. doi:10.1177/1471082X0801000103.
  • Gallo, M. 2015. Tucker3 model for compositional data. Communications in Statistics – Theory and Methods 44 (21):4441–53. doi:10.1080/03610926.2013.798664.
  • Ghosh, D., and R. Chakrabarti. 2009. Joint variable selection and classification with immunohistochemical data. Biomarker Isnsights 4:BMI–S2465.
  • Glover, D. M., and P. K. Hopke. 1994. Exploration of multivariate atmospheric particulate compositional data by projection pursuit. Atmospheric Environment 28 (8):1411–24. doi:10.1016/1352-2310(94)90204-6.
  • Godichon-Baggioni, A., C. Maugis-Rabusseau, and A. Rau. 2019. Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data. Journal of Applied Statistics 46 (1):47–65. doi:10.1080/02664763.2018.1454894.
  • Gourieroux, C., A. Monfort, and A. Trognon. 1984. Pseudo maximum likelihood methods: Theory. Econometrica 52 (3):681–700. doi:10.2307/1913471.
  • Graf, M. 2020. Regression for compositions based on a generalization of the Dirichlet distribution. Statistical Methods & Applications 29:913–936.
  • Graffelman, J., V. Pawlowsky-Glahn, J. J. Egozcue, and A. Buccianti. 2017. Compositional canonical correlation analysis. bioRxiv, 144584.
  • Greenacre, M. 2007. Correspondence analysis in practice, 3rd ed. Boca Raton, FL: Chapman & Hall.
  • Greenacre, M. 2010. Log-ratio analysis is a limiting case of correspondence analysis. Mathematical Geosciences 42 (1):129–34. doi:10.1007/s11004-008-9212-2.
  • Greenacre, M. 2011. Measuring subcompositional incoherence. Mathematical Geosciences 43 (6):681–93. doi:10.1007/s11004-011-9338-5.
  • Greenacre, M. 2018. Compositional data analysis in practice. Boca Raton, FL: CRC Press.
  • Greenacre, M. 2019. Variable selection in compositional data analysis using pairwise logratios. Mathematical Geosciences 51 (5):649–82. doi:10.1007/s11004-018-9754-x.
  • Greenacre, M. 2021. Compositional data analysis. Annual Review of Statistics and Its Applications 8:1–27.
  • Gueorguieva, R., R. Rosenheck, and D. Zelterman. 2008. Dirichlet component regression and its applications to psychiatric data. Computational Statistics & Data Analysis 52 (12):5344–55. doi:10.1016/j.csda.2008.05.030.
  • Hastie, T., and F. Little. 1987. Principal profiles. In Proccedings of 1st Symposion on the Interface between Computer Science and Statistics, 243–9.
  • Hijazi, R. 2011. An EM-algorithm based method to deal with rounded zeros in compositional data under Dirichlet models. In Proceedings of the 4rth Compositional Data Analysis Workshop, Girona, Spain.
  • Hijazi, R., and R. Jernigan. 2009. Modelling compositional data using Dirichlet regression models. Journal of Applied Probability and Statistics 4 (1):77–91.
  • Hijazi, R. 2009. Dealing with rounded zeros in compositional data under Dirichlet models. In Tenth Islamic Countries Conference on Statistical Sciences (ICCS-X), Volume II, The Islamic Countries Society of Statistical Sciences, Lahore: Pakistan, (2010): 701–707.
  • Hron, K., P. Filzmoser, S. Donevska, and E. Fišerová. 2013. Covariance-based variable selection for compositional data. Mathematical Geosciences 45 (4):487–98. doi:10.1007/s11004-013-9450-9.
  • Hron, K., P. Filzmoser, and K. Thompson. 2012. Linear regression with compositional explanatory variables. Journal of Applied Statistics 39 (5):1115–28. doi:10.1080/02664763.2011.644268.
  • Iyengar, M., and D. K. Dey. 1998. Box–Cox transformations in Bayesian analysis of compositional data. Environmetrics 9 (6):657–71. doi:10.1002/(SICI)1099-095X(199811/12)9:6<657::AID-ENV329>3.0.CO;2-1.
  • Jackson, C., and M. Baxter. 1999. Variable selection in archaeometry: The statistical analysis of glass compositional data. Bar International SERIES 757:159–62.
  • Jelsema, C., and R. Paul. 2013. Spatial mixed effects model for compositional data with applications to coal geology. International Journal of Coal Geology 114:33–43. doi:10.1016/j.coal.2013.04.004.
  • Jeon, J. M., and B. U. Park. 2020. Additive regression with hilbertian responses. The Annals of Statistics 48 (5):2671–97. doi:10.1214/19-AOS1902.
  • Katz, J., and G. King. 1999. A statistical model for multiparty electoral data. American Political Science Review 93 (1):15–32. doi:10.2307/2585758.
  • Kent, J. T. 1982. The Fisher-Bingham distribution on the sphere. Journal of the Royal Statistical Society. Series B 44 (1):71–80.
  • Kobos, L., C. R. Ferreira, T. J. Sobreira, B. Rajwa, and J. Shannahan. 2021. A novel experimental workflow to determine the impact of storage parameters on the mass spectrometric profiling and assessment of representative phosphatidylethanolamine lipids in mouse tissues. Analytical and Bioanalytical Chemistry 413 (7):1837–13. pagesdoi:10.1007/s00216-020-03151-0.
  • Lancaster, H. 1965. The Helmert matrices. The American Mathematical Monthly 72 (1):4–12. doi:10.1080/00029890.1965.11970483.
  • Larrosa, J. M. 2003. A compositional statistical analysis of capital stock. In Proceedings of the 1st Compositional Data Analysis Workshop, Girona, Spain.
  • Le, H., and C. Small. 1999. Multidimensional scaling of simplex shapes. Pattern Recognition 32 (9):1601–13. doi:10.1016/S0031-3203(99)00023-0.
  • Lee, R. D., and L. R. Carter. 1992. Modeling and forecasting United-States mortality. Journal of the American Statistical Association 87 (419):659–71.
  • Leemis, L. M., and J. T. McQueston. 2008. Univariate distribution relationships. The American Statistician 62 (1):45–53. doi:10.1198/000313008X270448.
  • Leininger, T. J., A. E. Gelfand, J. M. Allen, and J. A. Silander. Jr, 2013. Spatial regression modeling for compositional data with many zeros. Journal of Agricultural, Biological, and Environmental Statistics 18 (3):314–34. doi:10.1007/s13253-013-0145-y.
  • Lin, W., P. Shi, R. Feng, and H. Li. 2014. Variable selection in regression with compositional covariates. Biometrika 101 (4):785–97. doi:10.1093/biomet/asu031.
  • Lovell, D., V. Pawlowsky-Glahn, J. J. Egozcue, S. Marguerat, and J. Bähler. 2015. Proportionality: A valid alternative to correlation for relative data. PLOS Computational Biology 11 (3):e1004075. doi:10.1371/journal.pcbi.1004075.
  • Lu, J., P. Shi, and H. Li. 2019. Generalized linear models with linear constraints for microbiome compositional data. Biometrics 75 (1):235–44. doi:10.1111/biom.12956.
  • Marden, J. I. 1998. Bivariate QQ-plots and spider web plots. Statistica Sinica 8 (3):813–26.
  • Mardia, K., and P. Jupp. 2000. Directional statistics. Chichester: John Wiley & Sons Inc.
  • Martín-Fernández, J. A., C. Barceló-Vidal, and V. Pawlowsky-Glahn. 2003. Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Mathematical Geology 35 (3):253–78. doi:10.1023/A:1023866030544.
  • Martín-Fernández, J., C. Barceló-Vidal, and V. Pawlowsky-Glahn. 1998. A critical approach to non-parametric classification of compositional data. In Advances in data science and classification, 49–56. Heidelberg, Germany: Springer.
  • Martín-Fernández, J., K. Hron, M. Templ, P. Filzmoser, and J. Palarea-Albaladejo. 2012. Model-based replacement of rounded zeros in compositional data: Classical and robust approaches. Computational Statistics & Data Analysis 56 (9):2688–704. doi:10.1016/j.csda.2012.02.012.
  • Martín-Fernández, J. A., and S. Thió-Henestrosa. 2016. Compositional data analysis. Cham, Switzerland: Springer.
  • Mateu-Figueras, G., and V. Pawlowsky-Glahn. 2007. The skew-normal distribution on the simplex. Communications in Statistics - Theory and Methods 36 (9):1787–802. doi:10.1080/03610920601126258.
  • Mateu-Figueras, G., V. Pawlowsky-Glahn, and C. Barceló-Vidal. 2005. The additive logistic skew-normal distribution on the simplex. Stochastic Environmental Research and Risk Assessment 19 (3):205–14. doi:10.1007/s00477-004-0225-1.
  • Meng, J. 2010. Multinomial logit PLS regression of compositional data. In 2010 Second International Conference on Communication Systems, Networks and Applications, volume 2, 288–291. IEEE.
  • Migliorati, S., A. Ongaro, and G. S. Monti. 2017. A structured Dirichlet mixture model for compositional data: Inferential and applicative issues. Statistics and Computing 27 (4):963–83. doi:10.1007/s11222-016-9665-y.
  • Miller, W. E. 2002. Revisiting the geometry of a ternary diagram with the half-taxi metric. Mathematical Geology 34 (3):275–90. doi:10.1023/A:1014842906442.
  • Mishra, A, et al. 2019. Robust regression with compositional covariates. arXiv preprint arXiv:1909.04990.
  • Monti, G., G. Mateu-Figueras, V. Pawlowsky-Glahn, and J. Egozcue. 2011. The shifted-scaled Dirichlet distribution in the simplex. In Proceedings of the 4rth Compositional Data Analysis Workshop, Girona, Spain.
  • Mooijaart, A., P. G. van der Heijden, and L. A. van der Ark. 1999. A least squares algorithm for a mixture model for compositional data. Computational Statistics & Data Analysis 30 (4):359–79. doi:10.1016/S0167-9473(98)00098-X.
  • Murteira, J. M. R., and J. J. S. Ramalho. 2016. Regression analysis of multivariate fractional data. Econometric Reviews 35 (4):515–52. doi:10.1080/07474938.2013.806849.
  • Neocleous, T., C. Aitken, and G. Zadora. 2011. Transformations for compositional data with zeros with an application to forensic evidence evaluation. Chemometrics and Intelligent Laboratory Systems 109 (1):77–85. doi:10.1016/j.chemolab.2011.08.003.
  • Oeppen, J. 2008. Coherent forecasting of multiple-decrement life tables: A test using Japanese cause of death data. In Proceedings of the 3rd Compositional Data Analysis Workshop, Girona, Spain.
  • Ongaro, A., and S. Migliorati. 2013. A generalization of the Dirichlet distribution. Journal of Multivariate Analysis 114:412–26. doi:10.1016/j.jmva.2012.07.007.
  • Österreicher, F., and I. Vajda. 2003. A new class of metric divergences on probability spaces and its applicability in statistics. Annals of the Institute of Statistical Mathematics 55 (3):639–53. doi:10.1007/BF02517812.
  • Owen, A. 2001. Empirical likelihood. Boca Raton: Chapman & Hall/CRC.
  • Palarea-Albaladejo, J., and J. A. Martín-Fernández. 2015. zCompositions—R package for multivariate imputation of left-censored data under a compositional approach. Chemometrics and Intelligent Laboratory Systems 143:85–96. doi:10.1016/j.chemolab.2015.02.019.
  • Palarea-Albaladejo, J., and J. Martín-Fernández. 2008. A modified EM alr-algorithm for replacing rounded zeros in compositional data sets. Computers & Geosciences 34 (8):902–17. doi:10.1016/j.cageo.2007.09.015.
  • Palarea-Albaladejo, J., J. A. Martín-Fernández, and J. Gómez-García. 2007. A parametric approach for dealing with compositional rounded zeros. Mathematical Geology 39 (7):625–45. doi:10.1007/s11004-007-9100-1.
  • Palarea-Albaladejo, J., J. A. Martín-Fernández, and J. A. Soto. 2012. Dealing with distances and transformations for fuzzy C-means clustering of compositional data. Journal of Classification 29 (2):144–69. doi:10.1007/s00357-012-9105-4.
  • Palmer, M. J., and G. B. Douglas. 2008. A bayesian statistical model for end member analysis of sediment geochemistry, incorporating spatial dependences. Journal of the Royal Statistical Society: Series C 57 (3):313–27.
  • Pantazis, Y., M. Tsagris, and A. T. Wood. 2019. Gaussian asymptotic limits for the α-transformation in the Analysis of compositional data. Sankhya A 81 (1):63–82. doi:10.1007/s13171-018-00160-1.
  • Papageorgiou, I., M. Baxter, and M. Cau. 2001. Model-based cluster analysis of artefact compositional data. Archaeometry 43 (4):571–88. doi:10.1111/1475-4754.00037.
  • Pawlowsky-Glahn, V., and A. Buccianti. 2011. Compositional data analysis. Chichester: Wiley Online Library.
  • Pawlowsky-Glahn, V., J. J. Egozcue, and R. Tolosana-Delgado. 2015. Modeling and analysis of compositional data. Chichester: John Wiley & Sons.
  • Pawlowsky-Glahn, V., and R. A. Olea. 2004. Geostatistical analysis of compositional data. New York: Oxford University Press.
  • Quinn, T. P. 2018. Visualizing balances of compositional data: A new alternative to balance dendrograms. F1000Research 7:1278. doi:10.12688/f1000research.15858.1.
  • R Core Team. 2020. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Rasmussen, C. L., J. Palarea-Albaladejo, M. S. Johansson, P. Crowley, M. L. Stevens, N. Gupta, K. Karstad, and A. Holtermann. 2020. Zero problems with compositional data of physical behaviors: A comparison of three zero replacement methods. International Journal of Behavioral Nutrition and Physical Activity 17 (1):1–10. doi:10.1186/s12966-020-01029-z.
  • Rayens, W. S., and C. Srinivasan. 1991b. Estimation in compositional data analysis. Journal of Chemometrics 5 (4):361–74. doi:10.1002/cem.1180050405.
  • Rayens, W. S., and C. Srinivasan. 1994. Dependence properties of generalized Liouville distributions on the simplex. Journal of the American Statistical Association 89 (428):1465–70. doi:10.1080/01621459.1994.10476885.
  • Rayens, W., and C. Srinivasan. 1991a. Box–Cox transformations in the analysis of compositional data. Journal of Chemometrics 5 (3):227–39. doi:10.1002/cem.1180050310.
  • Reyment, R. A. 1997. Multiple group principal component analysis. Mathematical Geology 29 (1):1–16. doi:10.1007/BF02769617.
  • Rodríguez-Vignoli, J., and F. Rowe. 2018. How is internal migration reshaping metropolitan populations in Latin America? A new method and new evidence. Population Studies 72 (2):253–73. doi:10.1080/00324728.2017.1416155.
  • Rousseeuw, P. J., and K. V. Driessen. 1999. A fast algorithm for the minimum covariance determinant estimator. Technometrics 41 (3):212–23. doi:10.1080/00401706.1999.10485670.
  • Savazzi, E., and R. Reyment. 1999. Aspects of multivariate statistical analysis in geology. Amsterdam, Netherlands: Elsevier.
  • Scealy, J., P. De Caritat, E. C. Grunsky, M. T. Tsagris, and A. Welsh. 2015. Robust principal component analysis for power transformed compositional data. Journal of the American Statistical Association 110 (509):136–48. doi:10.1080/01621459.2014.990563.
  • Scealy, J., and A. Welsh. 2011a. Properties of a square root transformation regression model. In Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain.
  • Scealy, J., and A. Welsh. 2011b. Regression for compositional data by using distributions defined on the hypersphere. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (3):351–75. doi:10.1111/j.1467-9868.2010.00766.x.
  • Scealy, J., and A. Welsh. 2014a. Colours and cocktails: Compositional data analysis 2013 Lancaster lecture. Australian & New Zealand Journal of Statistics 56 (2):145–69. doi:10.1111/anzs.12073.
  • Scealy, J., and A. Welsh. 2014b. Fitting Kent models to compositional data with small concentration. Statistics and Computing 24 (2):165–79. doi:10.1007/s11222-012-9361-5.
  • Scealy, J., and A. Welsh. 2017. A directional mixed effects model for compositional expenditure data. Journal of the American Statistical Association 112 (517):24–36. doi:10.1080/01621459.2016.1189336.
  • Sharp, W. 2006. The graph median–A stable alternative measure of central tendency for compositional data sets. Mathematical Geology 38 (2):221–9. doi:10.1007/s11004-006-9026-z.
  • Shi, P., A. Zhang, and H. Li. 2016. Regression analysis for microbiome compositional data. The Annals of Applied Statistics 10 (2):1019–40. doi:10.1214/16-AOAS928.
  • Shimizu, T. K., F. Louzada, A. K. Suzuki, and R. S. Ehlers. 2018. Modeling Compositional Regression with uncorrelated and correlated errors: A Bayesian approach. Journal of Data Science 16:221–50.
  • Smith, T., and T. Brunsdon. 1989. The time series analysis of compositional data. In Proceedings of the Survey Research Methods Section, American Statistical Association, 26–32.
  • Smith, B., and W. Rayens. 2002. Conditional generalized Liouville distributions on the simplex. Statistics 36 (2):185–94. doi:10.1080/02331880212046.
  • Sohn, M. B., and H. Li. 2019. Compositional mediation analysis for microbiome studies. The Annals of Applied Statistics 13 (1):661–81. doi:10.1214/18-AOAS1210.
  • Srivastava, D., J. Boyett, C. Jackson, X. Tong, and S. Rai. 2007. A comparison of permutation Hotelling’s T2 test and log-ratio test for analyzing compositional data. Communications in Statistics - Theory and Methods 36 (2):415–31. doi:10.1080/03610920600974021.
  • Stanley, C. R. 1990. Descriptive statistics for N-dimensional closed arrays: A spherical coordinate approach. Mathematical Geology 22 (8):933–56. doi:10.1007/BF00890118.
  • Stephens, M. A. 1982. Use of the von Mises distribution to analyse continuous proportions. Biometrika 69 (1):197–203. doi:10.1093/biomet/69.1.197.
  • Stewart, C. 2017. An approach to measure distance between compositional diet estimates containing essential zeros. Journal of Applied Statistics 44 (7):1137–52. doi:10.1080/02664763.2016.1193846.
  • Stewart, C., and C. Field. 2011. Managing the essential zeros in quantitative fatty acid signature analysis. Journal of Agricultural, Biological, and Environmental Statistics 16 (1):45–69. doi:10.1007/s13253-010-0040-8.
  • Sun, Z., W. Xu, X. Cong, and K. Chen. 2019. Log-contrast regression with functional compositional predictors: Linking preterm infant’s gut microbiome trajectories in early postnatal period to neurobehavioral outcome. arXiv preprint arXiv:1808.02403.
  • Susin, A., Y. Wang, K.-A. Lê Cao, and M. L. Calle. 2020. Variable selection in microbiome compositional data analysis. NAR Genomics and Bioinformatics 2 (2):lqaa029.
  • Tang, Z.-Z., and G. Chen. 2019. Zero-inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis. Biostatistics (Oxford, England) 20 (4):698–713. doi:10.1093/biostatistics/kxy025.
  • Tjelmeland, H., and K. V. Lund. 2003. Bayesian modelling of spatial compositional data. Journal of Applied Statistics 30 (1):87–100. doi:10.1080/0266476022000018547.
  • Tolosana-Delgado, R., N. Otero, V. Pawlowsky-Glahn, and A. Soler. 2005. Latent compositional factors in the Llobregat River Basin (Spain) hydrogeochemistry. Mathematical Geology 37 (7):681–702. doi:10.1007/s11004-005-7375-7.
  • Tsagris, M. 2015a. A novel, divergence based, regression for compositional data. In Proceedings of the 28th Panhellenic Statistics Conference, April, Athens, Greece.
  • Tsagris, M. 2015b. Regression analysis with compositional data containing zero values. Chilean Journal of Statistics 6 (2):47–57.
  • Tsagris, M. 2021. The k-NN algorithm for compositional data: A revised approach with and without zero values present. Journal of Data Science 12 (3):519–34. doi:10.6339/JDS.201407_12(3).0008.
  • Tsagris, M., A. Alenazi, and C. Stewart. 2021. Non-parametric regression models for compositional data. arXiv preprint arXiv:2002.05137.
  • Tsagris, M., C. Beneki, and H. Hassani. 2014. On the folded normal distribution. Mathematics 2 (1):12–28. doi:10.3390/math2010012.
  • Tsagris, M., S. Preston, and A. T. Wood. 2016. Improved classification for compositional data using the α-transformation. Journal of Classification 33 (2):243–61. doi:10.1007/s00357-016-9207-5.
  • Tsagris, M., S. Preston, and A. T. Wood. 2017. Nonparametric hypothesis testing for equality of means on the simplex. Journal of Statistical Computation and Simulation 87 (2):406–22. doi:10.1080/00949655.2016.1216554.
  • Tsagris, M., S. Preston, and A. Wood. 2011. A data-based power transformation for compositional data. In Proceedings of the 4rth Compositional Data Analysis Workshop, Girona, Spain.
  • Tsagris, M., and C. Stewart. 2018. A Dirichlet regression model for compositional data with zeros. Lobachevskii Journal of Mathematics 39 (3):398–412. doi:10.1134/S1995080218030198.
  • Tsagris, M., and C. Stewart. 2020. A folded model for compositional data analysis. Australian & New Zealand Journal of Statistics 62 (2):249–77. doi:10.1111/anzs.12289.
  • Upton, G. J. G. 2001. A toroidal scatter diagram for ternary variables. The American Statistician 55 (3):240–3. doi:10.1198/000313001317098257.
  • Van den Boogaart, K. G., and R. Tolosana-Delgado. 2013. Analyzing Compositional Data With R. Heidelberg, Germany: Springer.
  • van der Merwe, S. 2019. A method for Bayesian regression modelling of composition data. South African Statistical Journal 53 (1):55–64.
  • Wang, H., Q. Liu, H. M. Mok, L. Fu, and W. M. Tse. 2007. A hyperspherical transformation forecasting model for compositional data. European Journal of Operational Research 179 (2):459–68. doi:10.1016/j.ejor.2006.03.039.
  • Wang, H., J. Meng, and M. Tenenhaus. 2010. Regression modelling analysis on compositional data. In Handbook of partial least squares, 381–406. Heidelberg, Germany: Springer.
  • Wang, H., Z. Wang, and S. Wang. 2019a. Sliced inverse regression method for multivariate compositional data modeling. Statistical Papers 62 (1):361–93. doi:10.1007/s00362-019-01093-z.
  • Wang, Z., H. Wang, and S. Wang. 2019b. Linear mixed-effects model for multivariate longitudinal compositional data. Neurocomputing 335:48–58. doi:10.1016/j.neucom.2019.01.043.
  • Wang, H., Q. Yang, H. Qin, and H. Zha. 2008. Dirichlet component analysis: Feature extraction for compositional data. In Proceedings of the 25th International Conference on Machine Learning, 1128–35. ACM.
  • Watson, G., and H. Nguyen. 1985. A confidence region in a ternary diagram from point counts. Journal of the International Association for Mathematical Geology 17 (2):209–13. doi:10.1007/BF01033155.
  • Woronow, A. 1997b. Regression and discrimination analysis using raw compositional data–is it really a problem? In Proceedings of the 3rd Annual Conference of the International Association for Mathematical Geology, Barcelona, Spain, 157–62.
  • Woronow, A. 1997a. The elusive benefits of logratios. In Proceedings of the 3rd Annual Conference of the International Association for Mathematical Geology, Barcelona, Spain, 97–101.
  • Zadora, G., and T. Neocleous. 2009. Likelihood ratio model for classification of forensic evidence. Analytica Chimica Acta 642 (1-2):266–78. doi:10.1016/j.aca.2008.12.013.
  • Zadora, G., T. Neocleous, and C. Aitken. 2010. A two-level model for evidence evaluation in the presence of zeros. Journal of Forensic Sciences 55 (2):371–84. doi:10.1111/j.1556-4029.2009.01316.x.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.