287
Views
8
CrossRef citations to date
0
Altmetric
Research Article

SAPS 3 is not superior to SAPS 2 in cardiac surgery patients

, , , , &
Pages 111-119 | Received 08 Dec 2013, Accepted 28 Jan 2014, Published online: 19 Mar 2014

Abstract

Objectives. Cardiac surgery patients are excluded from SAPS2 but included in SAPS3. Neither score is evaluated for this exclusive population; however, they are used daily. We hypothesized that SAPS3 may be superior to SAPS2 in outcome prediction in cardiac surgery patients. Design. All consecutive patients undergoing cardiac surgery between January 2007 and December 2010 were included in our prospective study. Both models were tested with calibration and discrimination statistics. We compared the AUC of the ROC curves by DeLong's method and calculated OCC values. Results. A total of 5207 patients with mean age of 67.2 ± 10.9 years were admitted to the ICU. The mean length of ICU stay was 4.6 ± 7.0 days and the ICU mortality was 5.9%. The two tested models had acceptable discriminatory power (AUC: SAPS2: 0.777–0.875; SAPS3: 0.757–893). SAPS3 had a low AUC and poor calibration on admission day. SAPS2 had poor calibration on Days 1–6 and 8. Conclusions. Despite including cardiac surgery patients, SAPS3 was not superior to SAPS2 in our analysis. In this large cohort of ICU cardiac surgery patients, performance of both SAPS models was generally poor. In this subset of patients, neither scoring system is recommended.

Introduction

Today we have several scoring systems to estimate the probability of mortality for intensive care unit (ICU) patients. These scores can be used to stratify resource utilization and to control the improvement in quality of an institution. Regarding the economical aspect of medicine, an objective outcome prediction is important (Citation1).

The current ICU systems were developed several decades ago. In 1993, Le Gall et al. published the Simplified Acute Physiology Score (SAPS) 2 (Citation2) (). This system showed a lack of prognostic performance especially in calibration analysis (Citation3,Citation4). To overcome this problem the study group for SAPS3 () collected data about risk factors and outcomes in a heterogeneous cohort of ICU patients. Their objective was to develop a new, improved model for risk adjustment that considers the results of previous studies and incorporates missing variables which are relevant for accurate outcome prediction (Citation3,Citation4).

Table I. SAPS 2.

Table II. SAPS 3.

Although cardiac surgery patients were excluded from the SAPS2 database this score is among the most widely used systems in cardiac ICUs. Cardiac surgery patients represent a difficult population for postoperative scoring systems. They show temporary pathophysiological consequences related to the heart–lung machine, which can influence the different variables used in postoperative scoring systems (Citation5). When Metnitz et al. and Moreno et al. developed the SAPS3 in 2005 on the basis of a worldwide cohort, postoperative cardiac surgery patients were included in the population (Citation3,Citation4).

The first external validation of SAPS3 in surgical ICU patients was performed by Sakr et al. in 2008 (Citation6). This validation was carried out in a mixed-case cohort of 1851 surgical patients, including 26% of cardiac surgery cases. In 2011 we postulated SAPS2 to be a poor prognostic tool in cardiac surgery (Citation7). The performance of SAPS3 has not been validated in a purely cardiac surgery ICU population so far. Since the SAPS3 development database consideres cardiac surgery patients, we hypothesized that SAPS3 would be superior to SAPS2 in this study of cardiac surgery patient. The aim of this study was to evaluate SAPS3 in 5207 cardiac surgery patients and to compare its predictive ability to that of SAPS2.

Material and methods

General information

The study was based on an evaluation of prospectively collected data of all consecutive adult patients admitted to our ICU after cardiac surgery between 1 January 2007 and 31 December 2010. It was approved by the Institutional Review Board of our university (approval number: 2809-05/10). We considered only the first admission for patients who were readmitted to the ICU during the study period. Demographic data as well as the routinely calculated preoperative EuroSCORE were collected from our quality control system QIMS 2.5 (University Hospital of Muenster, Germany). All laboratory data were extracted from our intensive care information system COPRA 5.24 (COPRASYSTEM GmbH, Sasbachwalden, Germany), which is interfaced with the patient monitors (Philips IntelliVue MP70, Amsterdam, the Netherlands), ventilators (Draeger Evita IV, Luebeck, Germany and Hamilton Galileo, Bonaduz, Swizerland), blood gas analysis devices (ABL 800Flex Radiometer, Copenhagen, Denmark), and the central laboratories. The attending physician collected the data for the entire population. Two assigned medical clerks validated the data collection daily. A senior consultant performed a second periodical validation. Inconsistency between the evaluators was resolved by consensus. The outcome was defined as ICU mortality.

Statistical analyses

The statistical analyses were performed using SPSS (SPSS Inc, Chicago, IL, USA) and SAS (SAS Institute Inc, Cary, NC, USA). Continuous scale data are presented as mean ± standard deviation. A value of p < 0.05 was considered significant.

Discrimination (the ability of a scoring model to differentiate between survival and mortality) was evaluated with receiver operating characteristic (ROC) curves, which plot the sensitivity (true positive) against 1 – specificity (false positive). Comparison of ROC curves was performed using the method of DeLong. The area under the curve (AUC) indicates the discriminative ability of parameters, which means the ability to discriminate survivors from non-survivors. An AUC of 0.5 (a diagonal line) is equivalent to random chance, whereas an AUC of 1.0 implies perfect discrimination. Overall correct classification (OCC) (the ratio of correctly predicted survival and mortality to the total number of patients) values were calculated. Calibration (correlation of the estimated probability of mortality and the actual observed mortality) was analyzed according to the Hosmer and Lemeshow goodness of fit test (HL). A high ‘Chi²’- value indicates a poor calibration especially with a related p value < 0.05.

Daily risk scoring was supported by previous studies to monitor physiological dysfunction and to allow prognoses and therapeutic decisions to be discussed or reconsidered (Citation7). Statistical analyses were performed from ICU Day 1 (operative day) (n = 5207) to Day 8 (n = 601) only, in order to obtain accurate statistical results and to avoid a small number of patients. A precise overview with the number of patients at risk on each day can be found in the first column of .

Table III. Patients’ data and operations during the study period.

Table IV. Statistical results.

Results

Population characteristics

This study includes 5207 ICU patients admitted over a period of 4 years; 37.6% were female. The mean age was 67.2 ± 10.9 years. The mean length of ICU stay was 4.6 ± 7.0 days and ICU mortality was 5.9% (n = 305). The preoperative mean additive EuroSCORE was 6.3 ± 3.7 and mean logistic EuroSCORE was 9.9 ± 12.9. The types of surgical procedures, the patients’ data and the proportion of length of ICU stay and the respective mortality are shown in .

Results of statistical analyses

summarizes the OCC, the discriminatory power, the comparison of the two models’ AUC curves by DeLong's method, and the calibration of both SAPS models. Both scores had acceptable discrimination on all days, with AUCs of ≥ 0.777 for SAPS2 and ≥ 0.757 for SAPS3. The best results of both scores were on Day 2 (SAPS2: 0.875; SAPS3: 0.893). SAPS3 discriminated the poorest on Day 1, whereas SAPS2 showed the poorest discrimination on Day 8. The DeLong analysis showed a significant p value in favor of the SAPS2 model on Days 1, 4, and 5, whereas SAPS3 was significantly superior on Days 6 and 7. For the ICU Days 2, 3, and 8 the differences between AUCs of both score models were not significant. In general, SAPS2 showed better AUC curves during the early postoperative phase. shows the ROC curves of both models on the first two postoperative days. The best OCC level for SAPS2 was on the first day, with 94.4%, and for SAPS3, the best OCC level was on the second day, with 95.2%. However, both models showed the minimum OCC value on Day 8 (SAPS2: 78.9%; SAPS3: 82.7%). The calibration of SAPS2 was inadequate, with significant p values in the HL test on Days 1–6 and 8. In contrast, SAPS3 showed a significant p value only on Days 1 and 3.

Figure 1. SAPS 2 and SAPS 3 on ICU Days 1 and 2.

Figure 1. SAPS 2 and SAPS 3 on ICU Days 1 and 2.

We analyzed the AUC of both EuroSCORE models on the total population of 5207 patients. The preoperative additive EuroSCORE had an AUC of 0.734 (95%-confidence interval: 0.699–0.769) and the preoperative logistic EuroSCORE of 0.732 (95%-confidence interval: 0.697–0.767). No gender-based differences or missing data were present in this study.

Discussion

General information

ICU scoring systems found their place in critical care medicine. The report of an individual patient's severity of illness on the one side and the impact of ICU-related factors on patient outcome, such as ICU organization and management on the other side can be provided by a scoring system (Citation3). The reliability of a scoring system can be assessed using calibration and discrimination tests, considered by the European Society of Intensive Care Medicine (ESICM) to be the best methods to validate scoring systems and prognostic parameters. It is known that no model can have perfect calibration and perfect discrimination at the same time. Focusing on an individual patient only, the discrimination is important; however, for clinical trials or for the comparison of ICUs, a good calibration is mandatory (Citation6).

To our best knowledge, this is the first external validation of the global SAPS3 admission model on a large database of an independent population of 5207 cardiac surgery patients. We think it is necessary to inform the reader that our ICU was one of four German centers that contributed patient data to the original SAPS3 database.

SAPS2

SAPS2 was developed in 1993 (Citation2) based on a European/North American database, which includes 13,152 patients. Cardiac surgery patients were originally excluded from the score's target. Nevertheless, the score is currently used in many cardiac surgery ICUs. SAPS2 focuses on data from the first 24 h after ICU admission (Citation4). The score has been extensively validated in external studies in different patient cohorts during the last two decades. The general findings in these studies were good discrimination but poor calibration, and significant differences between observed and expected mortality rates (Citation3,Citation8–11). In the year 2011, we were able to confirm the poor calibration of SAPS2 in a purely cardiac surgical population (Citation7). The performance of SAPS2 was tested in the original SAPS3 paper, where the score also showed acceptable discrimination but lack of calibration.

There are several potential reasons for these findings, such as user-dependent problems as well as patient-dependent and model-dependent problems (Citation3). A main bias might be the old database of SAPS2 from the early 1990s. Since then, there have been changes in the prevalence of major diseases and in the availability and use of major diagnostic and therapeutic methods (Citation4,Citation12). All that leads to a poor calibration of the old model. Furthermore, the SAPS2 database includes only patients from Europe and North America, which limits the ability to represent the reality of intensive care medicine in the rest of the world (Citation4).

SAPS3

SAPS3 was published in 2005 (Citation3,Citation4), in order to have a new, improved model for risk adjustment and to overcome the shortcomings related to different case mixes and lead-time bias of SAPS2. The development of SAPS3 was based on the largest prospective, epidemiological, multicenter, multinational study with a high-quality population of 16,784 patients consecutively admitted to 303 ICUs from 35 countries around the world. Heterogeneity is reflected in an ICU case mix that also includes cardiac surgery patients. SAPS3 considers only physiological data within the first hour after ICU admission. Beside a global model, several regional models, which were also derived from the original database during the developmental phase of SAPS3, were presented in 2005. The global model allows a broad comparison of worldwide ICUs, but that obviously has less relevance to local conditions. However, the different regional models focus on higher local accuracy, since they are related to the origin of the contributed data.

In the original SAPS3 manuscript, Moreno et al. reported good discrimination (AUC: 0.848) and satisfactory calibration (Chi²: 10.56; p value: 0.39) of the global model, but found a poor fitting of the Central and Western Europe ICUs in mortality prediction. Hence, it is not surprising that the SAPS3 calibration and discrimination set was shown to vary widely around the world (Citation13). The SAPS3 28-day score was published by the official SAPS3 study group in 2008 as a new model that was derived from the original database to provide a specific long-term outcome prediction instrument (Citation14).

External validations of SAPS2 and SAPS3

Several authors directly compared SAPS2 and SAPS3 in external validations. Soares and Salluh (Citation15) validated both models in a study of 952 ICU cancer patients. The statistical analyses showed excellent discrimination for both models and acceptable calibration for SAPS3, but the SAPS2 calibration was poor. Ledoux et al. (Citation16) reported an inadequate mortality prediction of the global SAPS3 in their mixed-case study of a Western European ICU, despite very good discriminative power, and concluded that SAPS3 was not significantly better than SAPS2. SAPS3 was first validated for mixed surgical patients by Sakr et al. (Citation6). They compared in their study the performance of the two SAPS models in 1851 surgical ICU patients. The authors reported poor calibration for SAPS2 and SAPS3, while the discrimination was generally good for both models. Poole et al. (Citation17) postulated in a recently published multicenter study in 3661 mixed-case ICU patients that both SAPS models were unreliable tools for hospital mortality prediction. In this study, SAPS3 showed poorer statistical results than SAPS2. These findings stand in total contradiction to another study that was published by Juneja et al. in 2012 (Citation18). The authors compared the accuracy of ICU outcome prediction between general established ICU scoring systems and their recent counterparts. The results showed clearly that the younger generation scores were superior and had better accuracy.

Why is the calibration of SAPS3 and especially SAPS2 inaccurate?

Commonly, the inaccuracy of prognostic models is due to a possible disparity between the case mix of an external population and the original reference database. Several SAPS2 and SAPS3 variables might contribute to the limited predictive ability of the models in a cardiac surgery population. Cardiac surgery patients show temporary adverse effects and pathophysiological consequences related to the heart–lung machine, which can influence several variables in postoperative scoring systems, including SAPS2 and SAPS3, among others (Citation5). However, most of these pathophysiological changes, such as changes in electrolyte metabolism are temporary and do not have much influence on the outcome of these patients (Citation19). Furthermore, postoperative sedation limits the role of the Glasgow Coma Scale (GCS) as a prognostic parameter (Citation20). The GCS is affected by sedation, anesthesia, and paralysis, and the calculation requires clinical evaluation, which may be biased by subjective interpretation (Citation7). The patients’ age might also be an inconsistent variable, due to positive selection regarding the patients’ preoperative state and the surgeons’ experience. Nevertheless, it was recently verified that several variables of the two SAPS models were good predictors of outcome (Citation21).

What compromises the predictive ability of SAPS3 during the first day in the ICU?

The decrease of a study population due to the discharge of uncomplicated cases is a common and known phenomenon in cardiac surgery (). It is, therefore, important that an ICU score is reliable during the critical hours of the admission day to identify which patient is at higher risk (Citation22). At risk patients can be further observed in the ICU and the discharge of an endangered patient to the general floor, which might lead to a higher mortality due to ICU readmission or prolonged hospital stay, can be avoided.

One of the main reasons for the moderate discriminatory performance of SAPS3 on the admission day could be due to the different weights of the score's variables. Though SAPS3 considers only post-ICU admission data of the first hour (Box 3 of ), the incorporation of these data improves the prognostic capacity of SAPS3 by only 27.5%, whereas 50% of the predictive power of SAPS3 is derived from pre-ICU admission data (Box 1 of ). This is generally in contrast to Knaus et al. (Citation23) and to our observation in a cardiac surgery population during the last years (Citation24,Citation25). The major event in cardiac surgery patients is the operation itself. The small contribution of post-ICU admission data to the predictive power of SAPS3 might be a strong bias in a cardiac surgery population and could be the main weakness of SAPS3 during the critical early ICU period of cardiac surgery patients.

What conclusions could be drawn from this study?

Although cardiac surgery patients were excluded from the SAPS2 this score is among the most widely used systems in cardiac ICUs. We postulate, based on the results of this study, not to use this poor prognostic tool in a cardiac surgery population.

The SAPS3 model specifically includes postoperative cardiac surgery patients. This study is the worldwide first external validation of SAPS 3 in a purely cardiac surgery ICU population. Especially during the early time after ICU admission, the SPAS3 is inferior to SAPS2 in risk evaluation.

This knowledge may have direct influence on the cardiac surgical population and the daily ICU routine since mortality prediction with SAPS models is not reliable. The treating clinician in cardiac surgery ICU should question the use of SAPS2 or SAPS3 in clinical trials and the comparison of ICUs of different health institutions based on these scores.

Declaration of interest: The authors declare that they have neither financial nor non-financial competing interests.

The authors alone are responsible for the content and writing of the paper.

Unknown widget #5d0ef076-e0a7-421c-8315-2b007028953f

of type scholix-links

References

  • Piacentini E, Ferrer C. Scoring prognostic system: to predict or not to predict. Minerva Anestesiol. 2012;78: 149–50.
  • Le Gall JR, Lemeshow S, Saulnier F. A new Simplified Acute Physiology Score (SAPS 2) based on a European/North American multicenter study. J Am Med Assoc. 1993;270: 2957–63.
  • Metnitz PG, Moreno RP, Almeida E, Jordan B, Bauer P, Campos RA, et al. SAPS 3 – from evaluation of the patient to evaluation of the intensive care unit. Part1: objectives, methods and cohort description. Intensive Care Med. 2005;31:1336–44.
  • Moreno RP, Metnitz PG, Almeida E, Jordan B, Bauer P, Campos RA, et al. SAPS 3 - from evaluation of the patient to evaluation of the intensive care unit. Part 2: development of a prognostic model for hospital mortality at ICU admission. Intensive Care Med. 2005;31:1345–55.
  • Weiss YG, Merin G, Koganov E, Ribo A, Oppenheim-Eden A, Medalion B, et al. Post cardiopulmonary bypass hypoxemia: a prospective study on incidence, risk factors, and clinical significance. J Cardiothorac Vasc Anesth. 2000;14: 506–13.
  • Sakr Y, Krauss C, Amaral AC, Réa-Neto A, Specht M, Reinhart K, Marx G. Comparison of the performance of SAPS 2, SAPS 3, APACHE II, and their customized prognostic models in a surgical intensive care unit. Br J Anaesth 2008;101:798–803.
  • Doerr F, Badreldin AM, Heldwein MB, Bossert T, Richter M, Lehmann T, et al. A comparative study of four intensive care outcome prediction models in cardiac surgery patients. J Cardiothorac Surg 2011;6:21.
  • Metnitz PG, Valentin A, Vesely H, Alberti C, Lang T, Lenz K, et al. Prognostic performance and customization of the SAPS 2: results of a multicenter Austrian study. Simplified Acute Physiology Score. Intensive Care Med. 1999;2:192–7.
  • Moreno R, Miranda DR, Fidler V, Van Schilfgaarde R. Evaluation of two outcome prediction models on an independent database. Crit Care Med. 1998;26:50–61.
  • Metnitz PG, Vesely H, Valentin A, Popow C, Hiesmayr M, Lenz K, et al. Evaluation of an interdisciplinary data set for national intensive care unit assessment. Crit Care Med. 1999;27:1486–91.
  • Aegerter P, Boumendil A, Retbi A, Minvielle E, Dervaux B, Guidet B. SAPS 2 revisited. Intensive Care Med. 2005; 31:416–23.
  • Popovich MJ. If most intensive care units are graduating with honors, is it genuine quality or grade inflation?Crit Care Med. 2002;30:2145–6.
  • Strand K, Flaatten H. Severity scoring in the ICU: a review. Acta Anaesthesiol Scand. 2008;52:467–78.
  • Moreno RP, Metnitz PG, Metnitz B, Bauer P, Afonso de Carvalho S, Hoechtl A; SAPS 3 Investigators. Modeling in-hospital patient survival during the first 28 days after intensive care unit admission A prognostic model for clinical trials in general critically ill patients. J Crit Care. 2008;23: 339–48.
  • Soares M, Salluh JI. Validation of the SAPS 3 admission prognostic model in patients with cancer in need of intensive care. Intensive Care Med. 2006;32:1839–44.
  • Ledoux D, Canivet JL, Preiser JC, Lefrancq J, Damas P. SAPS 3 admission score: an external validation in a general intensive care population. Intensive Care Med. 2008;34:1873–7.
  • Poole D, Rossi C, Latronico N, Rossi G, Finazzi S, Bertolini G, GiViTI. Comparison between SAPS 2 and SAPS 3 in predicting hospital mortality in a cohort of 103 Italian ICUs. Is new always better? Intensive Care Med. 2012;38:1280–8.
  • Juneja D, Singh O, Nasa P, Dang R. Comparison of newer scoring systems with the conventional scoring systems in general intensive care population. Minerva Anestesiol. 2012;78:194–200.
  • Hekmat K, Kroener A, Stuetzer H, Schwinger RH, Kampe S, Bennink GB, Mehlhorn U. Daily assessment of organ dysfunction and survival in intensive care unit cardiac surgical patients. Ann Thorac Surg. 2005;79:1555–62.
  • Vincent JL, Moreno R, Takala J, Willatts S, De Mendonça A, Bruining H, et al. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med. 1996;22:707–10.
  • Salciccioli JD, Cristia C, Chase M, Giberson T, Graver A, Gautam S, et al. Performance of SAPS 2 and SAPS 3 Scores in post-cardiac arrest. Minerva Anestesiol. 2012;78: 1341–7.
  • Zimmermann JE, Draper EA, Wagner DP. Comparing ICU populations: background and current methods. In: Sibbald WJ, Bion JF, eds. Evaluating critical care. Berlin: Springer Verlag 2001. pp. 121–39.
  • Knaus WA, Wagner DP, Draper EA, Zimmerman JE, Bergner M, Bastos PG, et al. The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults. Chest. 1991;100:1619–36.
  • Badreldin AM, Kroener A, Heldwein MB, Doerr F, Vogt H, Ismail MM, et al. Prognostic value of daily cardiac surgery score (CASUS) and its derivatives in cardiac surgery patients. Thorac Cardiovasc Surg. 2010;58:392–7.
  • Doerr F, Badreldin AM, Bender EM, Heldwein MB, Lehmann T, Bayer O, et al. Outcome prediction in cardiac surgery: the first logistic scoring model for cardiac surgical intensive care patients. Minerva Anestesiol. 2012;78:879–86.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.