384
Views
3
CrossRef citations to date
0
Altmetric
Editorial

Models for prediction of death in systemic sclerosis: current perspectives and future directions

&
Pages 391-393 | Published online: 10 Jan 2014

The ability to predict death, or conversely, to estimate survival, is of primary importance when chronic and life-threatening diseases are concerned. This fulfils both the needs of clinical practitioners to better allocate efforts and resources toward those who are at risk for an unfavorable outcome and the expectations of patients who are willing to have access to accurate prognostic information. The necessary means to make a prognostication are usually provided by an array of statistical algorithms each with different qualities and inductive biases. We must, however, carefully realize that the tool we use to answer a research question must be appropriate for the purpose and that the derived information must be adequately contextualized.

Survival analysis (failure time or time-to-event analysis) is used to describe the occurrence of an event (i.e., death) in a specified time-frame and the effect of variables on survival times. The interpretation of survival data is not always straightforward; although the user may be tempted to simplistically assume that time-to-event analysis will always provide predictions, he should know that these statistical approaches provide conceptually different information: inference. A prediction can be roughly defined as ‘a guess of what will happen in the future’, whilst inference as ‘the reading of all the clues and available data to draw conclusions about the relationship between variables and outcomes’. Inferential reasoning allows an exhaustive use of the existing information and prior knowledge, yet differently from predictions, the derived conclusions may not be answered at the end of the observational time, providing results that cannot be easily translated into daily clinical practice. When the interest is the construction of accurate prognostic models, we should then exploit the information gathered by inferential analysis to make the most reliable predictions. The path that leads from inference to prediction is a mandatory step in prognostication and in the construction of a survival model that is nonetheless seldom undertaken.

In the biomedical field, failure-time analysis is mostly performed either via the Kaplan–Meier or Cox regression method. The Kaplan–Meier method allows the estimation of time-dependent survival probabilities in a population or in two or more subgroups from a population, yet it ignores the possible effect of covariates on survival. This problem is overcome by the Cox regression method, whose output measure is the hazard ratio, which is an expression of the chance of events occurring in the one group (e.g., exposed to treatment) as a ratio of the chance of the events occurring in another group (e.g., controls). This ratio expresses the probability that an individual who did not experience the event at a certain time will eventually have the event at that time. Whilst these approaches provide valuable information, they also have important limitations and, if not properly used or interpreted, they may produce highly biased results and misleading conclusions Citation[1–3]. Moreover, these methods do not provide a discriminative but, rather, a separative output. Discrimination quantifies the ability of the model to correctly classify subjects into one of two categories (i.e., events and non-events), whilst separation evaluates the divergence of the survival characteristics associated with a variable. When the aim is to overcome inference and to proceed toward prediction we must, however, rely on measures of discrimination.

In the last two decades, a number of articles dealing with survival/mortality in different populations of systemic sclerosis (SSc) patients have been published. Of these, three specifically focused their attention on modeling and prognostication, whilst the remainder described a number of risk factors significantly associated with mortality and whose main merit is the reproducibility of results in different populations Citation[4]. Yet, as with any form of inferential reasoning, it may be difficult to translate the categorization ‘low’ or ‘high’ risk into a prediction such as ‘surviving’ or ‘not surviving’. This difficulty is inflated when the interactions among an array of risk factors is taken into account. What then should we expect from a patient that bears one, two or more mortality risk factors alone or in combination with one or more variables with a protective value? The studies by Bryan et al.Citation[5] and Scussel-Lonzetti et al.Citation[6] represent a first tentative step in this direction, yet ultimately fail to solve this vexed question owing to conceptual and methodological mistakes. Although the authors had used a discriminative measure to perform prognostication, the absence of a correct internal validation process led to results predictably not being repeatable and especially not generalizable. Indeed, without proper internal validation, the expected mortality increases as the number of predictors grows, configuring a clear case of adaption of the model to the data (overfitting) Citation[7]. Moreover, the authors exclusively investigated the additive effects of attributes without adding any clue about the way these variables interact and thus, the significance of the results may be overestimated as well as difficult to interpret.

We recently tried to overcome these limits, developing and validating a fully non-parametric 5-year survival model in SSc patients Citation[8]. With the use of advanced computational data mining techniques it was possible to build a predictive model that accounted for the linear and the nonlinear interaction of variables; notably, this model was internally validated and tested on an independent external population, ensuring the repeatability of the predictions. This model was also more efficient than experienced clinicians’ predictions in its attitude to objectively analyze patients’ data and to reach unbiased conclusions demonstrating, for the first time, that it is possible to make accurate individual predictions in SSc patients. Although this mortality model can claim statistical solidity, logical simplicity and biological plausibility, it is partially flawed by an intrinsic bias because censorship was ignored. Omitting subjects with short follow-up or ignoring the survival time may indeed produce upwardly biased estimates of failure and/or may cause the searching algorithm to suffer from information loss due to reduced sample size.

In light of these considerations, it is clear that the ideal predictive death model in SSc has not been constructed so far. This model should be capable of performing a full model-free analysis, taking into account all the possible interactions among variables, generating time-dependent predictions based on discriminative measures that do not ignore censorship and, of course, that are supported by an appropriate internal validation. Is it possible to build such a model? Do we have at hand the tools that satisfy these requirements or is this process just a chimera? It is encouraging to know that everything we need for this purpose is already available. First, a number of data-mining techniques have been successfully adapted to the survival analysis. The advantage of these methods is their non-parametric nature, their ability to handle a large number of candidate variables even in the presence of a limited number of cases, and to model interactions. Examples of such methods include ‘survival trees’ and ‘random survival forests’, whose implementations are developed as freely available R-software packages. Second, discriminative measures to assess the accuracy of death models in the presence of censored data are available; these include Harrell’s C-index Citation[9], the Brier score for censored data Citation[10] and time-dependent ROC curves Citation[11]. Finally, recent works have demonstrated the need for a correct internal validation (i.e., cross-validation) for parameter tuning or to evaluate the capability of generalization of survival classifiers Citation[12,13]. These very same methodologies were foreseen and applied in the Survival Dimensionality Reduction (SDR) kernel, a fully non-parametric, model-free data mining method for the analysis of survival high-dimensional data, which makes extensive use of the Brier score for censored data as a discriminative measure and of the cross-validation procedure to select and validate the most predictive model Citation[14].

In conclusion, survival analysis has a long-standing history in SSc, but seldom has it been used for prognostication and for the construction of reliable survival/death models. With the exception of a few and not always fully satisfactorily attempts, their clinical interpretation and application was left to the reader who, unfortunately, was not given the means to judge the practical quality and the effectiveness of the results. If we wish to provide a meaningful prognostic tool to be used at the bedside or to drive interventional clinical trials, we would have to rely on proper instruments that allow a full analysis of data and a correct evaluation of the results. Albeit computationally demanding and conceptually complex, these tools are already available and their application in the SSc field is strongly advisable.

Financial & competing interests disclosure

The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

No writing assistance was utilized in the production of this manuscript.

References

  • Hernán MA. The hazards of hazard ratios. Epidemiology21(1), 13–15 (2010).
  • Concato J, Peduzzi P, Holford TR et al. Importance of events per independent variable in proportional hazards analysis. I. Background, goals, and general strategy. J. Clin. Epidemiol.48(12), 1495–1501 (1995).
  • Mathew A, Pandey M, Murthy NS. Survival analysis: caveats and pitfalls. Eur. J. Surg. Oncol.25(3), 321–329 (1999).
  • Karassa FB, Ioannidis JP. Mortality in systemic sclerosis. Clin. Exp. Rheumatol.26(5 Suppl. 51), S85–S93 (2008).
  • Bryan C, Knight C, Black CM et al. Prediction of five-year survival following presentation with scleroderma: development of a simple model using three disease factors at first visit. Arthritis Rheum.42(12), 2660–2665 (1999).
  • Scussel-Lonzetti L, Joyal F, Raynauld JP et al. Predicting mortality in systemic sclerosis: analysis of a cohort of 309 French Canadian patients with emphasis on features at diagnosis as predictive factors for survival. Medicine (Baltimore)81(2), 154–167 (2002).
  • Concato J, Feinstein AR, Holford TR. The risk of determining risk with multivariable models. Ann. Intern. Med.118(3), 201–210 (1993).
  • Beretta L, Santaniello A, Cappiello F et al. Development of a five-year mortality model in systemic sclerosis patients by different analytical approaches. Clin. Exp. Rheumatol.28(2 Suppl. 58), 18–27 (2010).
  • Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med.5(4), 361–387 (1996).
  • Graf E, Schmoor C, Sauerbrei W et al. Assessment and comparison of prognostic classification schemes for survival data. Stat. Med.18(17–18), 2529–2545 (1999).
  • Heagerty PJ, Lumley T, Pepe MS. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics56(2), 337–344 (2000).
  • Simon RM, Subramanian J, Li MC et al. Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data. Brief Bioinform.12(3), 203–214 (2011).
  • Subramanian J, Simon R. An evaluation of resampling methods for assessment of survival risk prediction in high-dimensional settings. Stat. Med.30(6), 642–653 (2011).
  • Beretta L, Santaniello A, van Riel PL et al. Survival dimensionality reduction (SDR): development and clinical application of an innovative approach to detect epistasis in presence of right-censored data. BMC Bioinformatics.11, 416 (2010).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.