876
Views
1
CrossRef citations to date
0
Altmetric
Human Nutrition and Lifestyle

The problem of latent class trajectory analysis in child growth and obesity research

ORCID Icon
Pages 1-3 | Received 13 Oct 2022, Accepted 27 Feb 2023, Published online: 19 Jun 2023
This article is part of the following collections:
Current Issues in Human BiologyCommentaries in Human Biology

It is increasingly common to see papers in human biology and cognate journals in which scientists describe unobserved or latent patterns of child growth or body mass index (BMI) development. Most of us will have read abstracts which include text along the lines of “a latent class trajectory model (LCTM) was used to identify [say] four distinct groups of children with similar growth curves.” In addition to showing a figure of the mean growth curve for each group of children, most papers investigate how exposures and/or distal outcomes are associated with class membership. While this approach might seem mysterious, its power and appeal are easier to understand. LCTMs can, quite simply, be used to ask “person-centred” questions that cannot be addressed with traditional “variable-centred” statistical methods (Jung and Wickrama Citation2008). The same goes for secondary analyses using the identified classes. For example, investigating how aflatoxin exposure is associated with a nominal outcome with four categories, each representing a group of infants with similar length growth curves between zero and two years, is going to provide fundamentally different information than analyses of how aflatoxin exposure is associated with length growth itself.

A LCTM, also known as a group-based trajectory model, is a form of growth mixture model (GMM) (Johnson Citation2021). Note that GMMs are sometimes referred to as latent class mixed models. Briefly, GMMs relax the assumption of multilevel growth curve models that all individuals share a single average trajectory. In a structural equation modelling framework, this is done by combining a latent growth curve model with a categorical latent class variable. By incorporating regressions of the latent growth curve terms (eg intercept and slope(s)) on a set of dummy variables representing the categories of the latent class variable, GMMs estimate multiple average trajectories (ie one for each class). For a GMM to converge, the researcher normally must add parameter constraints. For example, it is common practice to constrain some of the growth terms (eg the cubic term in a cubic polynomial) to have zero variance and thus no covariance with the other growth terms. If all the growth terms are constrained to have zero variance (ie no between-individual differences), you have a LCTM (Jung and Wickrama Citation2008). These models do not describe or capture individual-level growth curves.

While GMMs assume that the population distribution of individual trajectories is composed of two or more subpopulations, LCTMs do not. How could they when they do not capture individual-level growth curves? Instead, it is argued that LCTMs use “the trajectory groups as a statistical device for approximating the unknown distribution of trajectories across population members” (Nagin and Odgers Citation2010). This might be theoretically correct, but at what cost does avoiding the assumption come.

A model which does not capture individual-level growth curves will provide a worse overall fit for longitudinal data than a model that does. Think of it this way. If you have ∼40,000 measurements of BMI on ∼16,000 children between target assessment ages of 5–11 years, what model would provide the best fit for the data?

  • A general linear regression model (does not describe individual growth curves)

  • A multilevel general linear model with random intercept and slope (describes individual growth curves)

The answer is B. In this example, using data from the UK Millennium Cohort Study, the Bayesian Information Criterion (BIC) was 3000 points or 14% lower (ie better) for model B. By anyone’s standards, this is a massive improvement in model fit. Most researchers would never even consider Option A on the basis that it does not account for the longitudinal nature of the data when you obviously need to because BMI trajectories are highly variable. Following on from this simple example, Sijbrandij et al. (Citation2020) have shown that GMMs in which the random effects are not constrained to be zero (ie the growth terms are allowed to have variances & covariances) provide a better fit than LCTMs in which the random effects are constrained to be zero. Even better model fits can normally be achieved by allowing the residual variances (ie errors) to differ over time and between classes. And some, normally smaller, gains in model fit can sometimes be achieved by allowing the random effect variances/covariances to differ between classes.

One of the explicit assumptions of general linear regression is that the errors are independent of each other. This assumption would be violated in Model A because the errors or residuals for each child would be serially correlated. LCTMs suffer from the same problem. By failing to account for the longitudinal nature of the data, you create an autocorrelation structure among the errors which is specific to each class. Gilthorpe et al. (Citation2014) have reported on this phenomenon and shown how important gains in model fit can be achieved through modelling the resulting class-specific autocorrelation structure. Doing this, however, is analytically and computationally challenging. Given this point, it seems nonsensical to fit a heavily constrained model with a complex class-specific autocorrelation structure, which should be modelled, rather than just develop a GMM with fewer constraints.

By this stage, it should be clear that LCTMs will inevitably provide a bad fit for longitudinal data. But does this matter if they identify the same latent classes (eg number, composition, posterior probabilities, etc) as a fully developed GMM? Maybe not. The fundamental problem though is that they do not. Nagin & Odgers acknowledge that “the addition of random effects to a group-based model can result in the use of fewer trajectory groups because their addition allows for more within-group variability in individual-level trajectories” (Nagin and Odgers Citation2010). They see this as being at odds with the goal of LCTMs to reduce within-group variability. I would not say that these models reduce within-group variability in the trajectories, they just do not parameterise it. For most readers, I imagine what is most important is developing and choosing the model with the “correct” classes. In a recent simulation study, Mesidor et al. (2022) demonstrated how LCTMs fail to recover the correct number of classes and/or shapes of the mean trajectories under different scenarios. In perhaps the most relevant scenario, Mesidor simulated data in which there were three latent classes and time point-specific overlap in the distribution of outcome Y between the classes. Such overlap is arguably always going to occur in studies of child growth. Each class in the simulated data comprised one-third of the sample and the mean linear trajectories can be described as high, low, and high-to-low. A latent class trajectory model did identify three classes, but the sample proportions were 13, 83, and 4% and the mean trajectories were close together and parallel to each other (with a shallow negative slope). This finding goes some way towards helping explain why many LCTM studies find a set of parallel trajectories, when theoretically more interesting classes must exist. This phenomenon of finding parallel trajectories has even been given a name – the “rainbow effect” (Vachon et al. Citation2017).

Given the limitations of LCTMs, one wonders why this approach is so commonly used to identify sub-groups of children in longitudinal data (Mattsson et al. Citation2019). The answer must partly be due to the availability of statistical programmes like PROC TRAJ in SAS or TRAJ in Stata that make fitting an advanced model an incredibly simple task. With just one line of code, anyone can fit a LCTM. But to develop that model into a good GMM can be a massive and expensive undertaking, requiring months of learning/statistical analysis and new software/hardware. If you or a student/colleague want to take on this challenge, I cannot strongly enough recommend the Guidelines for Reporting on Latent Trajectory Studies (GRoLTS) (van de Schoot et al. Citation2017). Following this guidance and checklist will ensure that you develop a good model and report the right information in any paper. Just as many journals require you to submit generic checklists (eg STROBE: Strengthening the reporting of observational studies in epidemiology), journals should require authors to submit a GRoLTS checklist if they have conducted any form of growth mixture modelling. If the GRoLTS guidance and checklist is closely followed, I struggle to imagine any scenario in our field where a LTCM would be selected as the final model.

In conclusion, LCTMs are unrealistic, provide a bad fit for longitudinal data, result in autocorrelated errors, and can fail to identify the “correct” classes. As such, they are a problem in child growth and obesity research. The application of this type of modelling will hopefully improve over time as evidence arguing against the use of this heavily constrained type of GMM (without considering alternative specifications) is published. Unfortunately, much of the existing literature likely suffers from the default use of a LCTM and should be interpreted with caution.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The Millennium Cohort Study data are available from the UK Data Archive at beta.ukdataservice.ac.uk/datacatalogue/series/series?id = 2000031

Additional information

Funding

WJ is supported by a UK Medical Research Council (MRC) New Investigator Research Grant (MR/P023347/1), and acknowledges support from the National Institute for Health Research (NIHR) Leicester Biomedical Research Centre, which is a partnership between University Hospitals of Leicester NHS Trust, Loughborough University, and the University of Leicester.

References

  • Gilthorpe MS, Dahly DL, Tu YK, Kubzansky LD, Goodman E. 2014. Challenges in modelling the random structure correctly in growth mixture models and the impact this has on model mixtures. J Dev Orig Health Dis. 5(3):197–205.
  • Johnson W. 2021. Modelling growth curves for epidemiology. In: Cameron N, Schell L, editors. Human growth and development. London (UK): Elsevier; 371–390.
  • Jung T, Wickrama KAS. 2008. An introduction to latent class growth analysis and growth mixture modeling. Soc Personal Psychol Compass. 2(1):302–317.
  • Mattsson M, Maher GM, Boland F, Fitzgerald AP, Murray DM, Biesma R. 2019. Group-based trajectory modelling for BMI trajectories in childhood: a systematic review. Obes Rev. 20(7):998–1015.
  • Mesidor M, Rousseau MC, O’Loughlin J, Sylvestre MP. 2022. Does group-based trajectory modeling estimate spurious trajectories? BMC Med Res Methodol. 22(1):194.
  • Nagin DS, Odgers CL. 2010. Group-based trajectory modeling in clinical research. Annu Rev Clin Psychol. 6:109–138.
  • Sijbrandij JJ, Hoekstra T, Almansa J, Peeters M, Bultmann U, Reijneveld SA. 2020. Variance constraints strongly influenced model performance in growth mixture modeling: a simulation and empirical study. BMC Med Res Methodol. 20(1):276.
  • Vachon DD, Krueger RF, Irons DE, Iacono WG, McGue M. 2017. Are alcohol trajectories a useful way of identifying at-risk youth? A multiwave longitudinal-epidemiologic study. J Am Acad Child Adolesc Psychiatry. 56(6):498–505.
  • van de Schoot R, Sijbrandij M, Winter SD, Depaoli S, Vermunt JK. 2017. The GRoLTS-checklist: guidelines for reporting on latent trajectory studies. Struct Equ Modeling Multidiscip J. 24(3):451–467.