6,104
Views
29
CrossRef citations to date
0
Altmetric
Editorial

The importance of cluster analysis for enhancing clinical practice: an example from irritable bowel syndrome

, , &
Pages 94-96 | Received 20 Dec 2017, Accepted 18 Jan 2018, Published online: 15 Feb 2018

In clinical populations, substantial heterogeneity exists in patient characteristics, illness severity and treatment responses. Better understanding of such heterogeneity may lead to more effective and efficient treatment by personalising care to better suit patient profiles. In this editorial, we argue that the statistical method of cluster analysis is a means by which such heterogeneity can be understood, potentially leading to improved care in mental health services. The method is as yet relatively under-utilised and as such the barriers to its use and implementation are also considered.

Cluster analysis is a statistical method that identifies subgroups as defined by multiple characteristics. For example in depression, there is heterogeneity in terms of age of onset (e.g. early versus late onset), exposure to life stress (Van den Berg et al., Citation2001) and severity of depression (e.g. mild, moderate or major depressive disorder) (Merikangas et al., Citation1994). Cluster analysis could help to identify subgroups within this patient population defined by the characteristics of age, stress exposure and depression severity all together. Use of such analysis could have several benefits including the development of diagnostic criteria, explanations of heterogeneous outcomes and tailoring of treatments (Song & Jason, Citation2005; Taylor et al., Citation2001). We use the word “subgroup” to refer to subsets of individuals within a given population that can be described using several characteristics. The use of this term should not be confused with what is traditionally called “subgroup analysis” in the clinical trials literature. Subgroup analysis in that literature refers to quantification of treatment responses in subsets of individuals identifiable by a single characteristic (e.g. a demographic or psychological variable). This may involve analysis within the subgroup of interest, or via simple regression models including interaction terms (moderation analysis) (Assmann et al., Citation2000). Instead, this paper provides a brief overview of cluster analysis, how it can be used to identify subgroups, the usefulness of such analysis and its potential application to clinical practice. We use some recent results in irritable bowel syndrome to illustrate these points.

Diagnostic utility

Mental and physical health diagnostic criteria are often criticised because they are considered to be too restrictive, too broad or to actually exclude important factors within a given condition (Bentall & Pilgrim, Citation1999; Wakefield, Citation2010). In irritable bowel syndrome (IBS), there are four subgroups defined by a single parameter – the predominant bowel pattern. Individuals can either be constipation predominant (IBS-C), diarrhoea predominant (IBS-D), alternating (IBS-A) or unclassified (IBS-U). However, assigning individuals to subgroups based on one parameter may limit the diagnostic utility and clinical relevance. Increasing the multidimensionality of the clinical profile of IBS subgroups could aid healthcare professionals in making positive diagnoses of IBS as opposed to diagnosis by exclusion. Improving clarity and validity of diagnostic criteria would also have the benefit of reducing the cost associated with diagnosis by exclusion, which often involves additional consultations and unnecessary diagnostic procedures. It also has a negative effect on the prognosis and illness trajectories of IBS patients as it leaves them with feelings of uncertainty regarding their condition and can lessen trust in healthcare professionals (Spiegel et al., Citation2010).

Multidimensionality of IBS clinical profiles has been examined using mixture modelling cluster analysis, which included measures of bowel symptom type (IBS-C, IBS-D and IBS-A), symptom severity, the occurrence of extra-intestinal symptoms, anxiety and depression (Polster et al., Citation2017). Six subgroups were found, identified by bowel pattern subtype and further subdivided by high or low ratings of comorbidities (somatic and psychological). Whilst supporting the distinctions between bowel patterns, the results indicate that assessments of additional somatic and psychological comorbidities are also important factors in distinguishing IBS subtypes. Furthermore, when the groups were compared on symptom severity and anxiety and depression, high comorbidity groups were found to have significantly higher levels of symptom severity, anxiety and depression than low comorbidity subgroups. Level of comorbidity therefore appears to be an important factor in distinguishing levels of symptom severity and psychological distress in IBS. Increasing the multidimensionality of subgroups in IBS could provide a means of understanding heterogeneity in outcomes that subgrouping by bowel pattern alone cannot.

How can cluster analyses improve treatment approaches?

The more comprehensive characterisation of subgroups provided by cluster analysis can help target treatments more specifically. For example in IBS, cognitive behavioural therapy (CBT) is the primary recommended treatment approach (Drossman, Citation2016; Spence & Moss-Morris, Citation2007). CBT aims to change unhelpful cognitions and behaviours contributing to the maintenance of symptoms. When assessing subgroups in IBS, including a measure of such tendencies to engage in unhelpful cognitive and behavioural patterns along with other empirically directed characteristic measures (such as anxiety and bowel pattern subtype) can inform how the different subgroups may be best targeted by CBT. For example, two hypothetical subgroups identified by cluster analysis may be (1) individuals with IBS-D and IBS-A with higher levels of gastrointestinal avoidance behaviour and high levels of general anxiety compared with (2) individuals with IBS-C who have high levels of safety behaviours and gastrointestinal (but not general) anxiety. The characterisations of these groups by the different measures included in the cluster analysis would therefore provide a basis for tailoring treatment for the subgroups. For instance, group 1 may benefit from behavioural experiments designed to demonstrate the likelihood of having an accident in public and stress management training to reduce general anxiety. In contrast, group 2 may benefit from cognitive restructuring regarding fears about not passing stools and behavioural exposure techniques to reduce anxiety specific to the experience of gastrointestinal sensations. The efficacy of such tailoring could be tested in an experimental design comparing the conditions and a control group with use of moderation analysis. In the context of randomised controlled trials, moderation analysis including clusters (subgroups) would identify whether there is an interaction between cluster membership (e.g. subgroup) and treatment group. In this example, moderation would determine whether membership of group 1 or 2 would affect treatment response in the different conditions.

Methodological approaches to the identification of subgroups

There are numerous approaches available to researchers intending to identify subgroups that exist under the umbrella term of “cluster analysis” (Nathan & Langenbucher, Citation2003). One of the most popular methods is finite mixture modelling, such as latent class analysis (LCA) (Stahl & Sallis, Citation2012). LCA operates on the assumption that a given dataset includes a mixture of scores from different underlying latent classes (subgroups) (Stahl & Sallis, Citation2012). The approach deduces information about the underlying distributions of the subgroups by identifying similar patterns of values and assessing the probability that certain cases are members of the identified subgroups (Fraley & Raftery, Citation1998). The LCA algorithm derives a range of subgroups from the data, and uses a goodness of fit statistic such as the Bayesian Information Criteria (BIC) (Nylund et al., Citation2007) to identify the optimal number of subgroups that adequately explains the distribution of the data. For example, an LCA may identify one model with 5 subgroups (also termed clusters or classes) and one with 4 subgroups. It will use the BIC goodness-of-fit statistic to identify which model best fits the observed data.

This method of identifying the optimal number of subgroups is a key advantage of LCA compared to other methods of cluster analysis such as distance based cluster analysis that use more arbitrary criterion. Other advantages of LCA include its ability to combine both continuous and categorical measures to define subgroups and allow for the inclusion of covariates and modelling of directional relationships. This means LCA can control for potential effects of other background variables and when used in prospective data, can be used to assess directional relationships (e.g. whether cluster membership predicts outcome) (Hagenaars & McCutcheon, Citation2002; Stahl & Sallis, Citation2012).

Barriers to implementation

Although cluster methods provide a powerful tool for understanding subgroups and differences in treatment response, they also require careful consideration prior to implementation. Good statistical power is necessary for robust results (Lanza & Rhoades, Citation2013). Sample size depends on the number of clusters being identified and the number of items/variables entered into the analysis (Dziak et al., Citation2014). There is no consensus on an adequate sample size, but previous research suggests samples above 200 are necessary to achieve sufficient power (Tekle et al., Citation2016), with some suggesting a minimum sample of 500 (Finch & Bronk, Citation2011). A strong empirical and/or theoretically informed basis for such analysis is vital to inform which measures are included in analysis and the extent to which clusters identified make theoretical/clinical sense (Breckenridge, Citation1989). Replication is also essential as the models derived from the analysis use the existing data distribution so further samples are needed to test the robustness of the models (Milligan, Citation1996). It is therefore recommended that two datasets are used or a large enough dataset that allows splitting the data in two samples. One sample is then used as the “training” dataset and the other is used as a validation sample (Everitt et al., Citation2001). Only once the identification of subgroups within a given population has been replicated (cross-validated) in multiple samples would there be a strong enough basis for updating existing diagnostic criteria and informing practice.

Conclusion

Cluster analysis is important for understanding the heterogeneity of clinical disorders, particularly those that challenge customary distinctions between physical and psychological aetiology. Cluster analysis methods can improve diagnostic criteria to provide more comprehensive and clinically meaningful profiles within a condition. In IBS this involves consideration of psychological aspects such as anxiety and in the future a wider approach including cognitive and behavioural factors. Cluster analysis also has the potential to improve our understanding of differential treatment responses in different patient subgroups and to provide more personalised treatment to enhance recovery.

Declaration of interest

The authors declare no conflict of interest.

References

  • Assmann SF, Pocock SJ, Enos LE, Kasten LE. (2000). Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet, 355, 1064–9
  • Bentall D, Pilgrim R. (1999). The medicalisation of misery: A critical realist analysis of the concept of depression. J Mental Health, 8, 261–74
  • Breckenridge JN. (1989). Replicating cluster analysis: Method, consistency, and validity. Multivariate Behav Res, 24, 147–61
  • Drossman DA. (2016). Functional gastrointestinal disorders: History, pathophysiology, clinical features, and Rome IV. Gastroenterology, 150, 1262–79. e1262
  • Dziak JJ, Lanza ST, Tan X. (2014). Effect size, statistical power, and sample size requirements for the bootstrap likelihood ratio test in latent class analysis. Struct Equ Modeling, 21, 534–52
  • Everitt B, Landau S, Leese M, Stahl D. (2001). Cluster analysis. 2001. London: Arnold
  • Finch WH, Bronk KC. (2011). Conducting confirmatory latent class analysis using M plus. Struct Equ Modeling, 18, 132–51
  • Fraley C, Raftery AE. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J, 41, 578–88
  • Hagenaars JA, McCutcheon AL. (2002). Applied latent class analysis. Edinburgh: Cambridge University Press
  • Lanza ST, Rhoades BL. (2013). Latent class analysis: An alternative perspective on subgroup analysis in prevention and treatment. Prev Sci, 14, 157–68
  • Merikangas KR, Wicki W, Angst J. (1994). Heterogeneity of depression. Classification of depressive subtypes by longitudinal course. Br J Psychiatry, 164, 342–8
  • Milligan GW. (1996). Clustering validation: Results and implications for applied analyses. In: De Soete G, Arabie P, Hubert LJ, eds. Clustering and classification. River Edge, NJ: World Scientific Publ., 341–75
  • Nathan PE, Langenbucher J. (2003). Diagnosis and classification. In: Stricker Bl G, Widiger TA, eds. Handbook of psychology: Clinical psychology (Vol. 8). New York: Wiley, 3426
  • Nylund KL, Asparouhov T, Muthén BO. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Struct Equ Modeling, 14, 535–69
  • Polster A, Van Oudenhove L, Jones M, et al. (2017). Mixture model analysis identifies irritable bowel syndrome subgroups characterised by specific profiles of gastrointestinal, extraintestinal somatic and psychological symptoms. Aliment Pharmacol Ther, 46, 529–39
  • Song S, Jason LA. (2005). A population-based study of chronic fatigue syndrome (CFS) experienced in differing patient groups: An effort to replicate Vercoulen et al.'s model of CFS. J Mental Health, 14, 277–89
  • Spence MJ, Moss-Morris R. (2007). The cognitive behavioural model of irritable bowel syndrome: A prospective investigation of patients with gastroenteritis. Gut, 56, 1066–71
  • Spiegel BM, Farid M, Esrailian E, et al. (2010). Is irritable bowel syndrome a diagnosis of exclusion?: A survey of primary care providers, gastroenterologists, and IBS experts. Am J Gastroenterol, 105, 848–58
  • Stahl D, Sallis H. (2012). Model‐based cluster analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 4, 341–58
  • Taylor RR, Leonard JA, Schoeny ME. (2001). Evaluating latent variable models of functional somatic distress in a community-based sample. J Mental Health, 10, 335–49
  • Tekle FB, Gudicha DW, Vermunt JK. (2016). Power analysis for the bootstrap likelihood ratio test for the number of classes in latent class models. Adv Data Anal Classif, 10, 209–24
  • Van den Berg MD, Oldehinkel AJ, Bouhuys AL, et al. (2001). Depression in later life: Three etiologically different subgroups. J Affect Disord, 65, 19–26
  • Wakefield JC. (2010). Misdiagnosing normality: Psychiatry's failure to address the problem of false positive diagnoses of mental disorder in a changing professional environment. J Mental Health, 19, 337–51

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.