631
Views
0
CrossRef citations to date
0
Altmetric
Emergency Medicine

Head-to-head comparison of 19 prediction models for short-term outcome in medical patients in the emergency department: a retrospective study

ORCID Icon, , ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all
Article: 2290211 | Received 27 Jun 2023, Accepted 04 Nov 2023, Published online: 08 Dec 2023

Abstract

Introduction

Prediction models for identifying emergency department (ED) patients at high risk of poor outcome are often not externally validated. We aimed to perform a head-to-head comparison of the discriminatory performance of several prediction models in a large cohort of ED patients.

Methods

In this retrospective study, we selected prediction models that aim to predict poor outcome and we included adult medical ED patients. Primary outcome was 31-day mortality, secondary outcomes were 1-day mortality, 7-day mortality, and a composite endpoint of 31-day mortality and admission to intensive care unit (ICU).

The discriminatory performance of the prediction models was assessed using an area under the receiver operating characteristic curve (AUC). Finally, the prediction models with the highest performance to predict 31-day mortality were selected to further examine calibration and appropriate clinical cut-off points.

Results

We included 19 prediction models and applied these to 2185 ED patients. Thirty-one-day mortality was 10.6% (231 patients), 1-day mortality was 1.4%, 7-day mortality was 4.4%, and 331 patients (15.1%) met the composite endpoint. The RISE UP and COPE score showed similar and very good discriminatory performance for 31-day mortality (AUC 0.86), 1-day mortality (AUC 0.87), 7-day mortality (AUC 0.86) and for the composite endpoint (AUC 0.81). Both scores were well calibrated. Almost no patients with RISE UP and COPE scores below 5% had an adverse outcome, while those with scores above 20% were at high risk of adverse outcome. Some of the other prediction models (i.e. APACHE II, NEWS, WPSS, MEWS, EWS and SOFA) showed significantly higher discriminatory performance for 1-day and 7-day mortality than for 31-day mortality.

Conclusions

Head-to-head validation of 19 prediction models in medical ED patients showed that the RISE UP and COPE score outperformed other models regarding 31-day mortality.

Introduction

Estimating the severity of the patient’s disease is an important challenge for physicians in the emergency department (ED). Crowding of EDs worldwide increases pressure on this estimation, since ED physicians have to make clinical decisions while multitasking, often with limited information and often being interrupted and working on several patients at the same time [Citation1,Citation2]. This not only affects patient satisfaction, but also places the patient at risk of delay in treatment, increased in-hospital length of stay and increased mortality [Citation3–7]. Short-term mortality after an ED visit is highly dependent on the setting and the examined population, with 30-day mortality ranging from 1.5 to 13.0% [Citation8–11]. Fast and reliable discrimination between high- and low-risk patients may guide clinical decision-making [Citation12].

Various clinical prediction models have been developed to address this need for discrimination between high- and low-risk patients. Some of these models were specifically designed for subpopulations of patients (e.g. patients with pneumonia (CURB-65) or sepsis (sepsis-related organ failure score (SOFA)) or older patients (risk stratification in the emergency department in acutely ill older patients (RISE UP)) [Citation13–15]. A recent systematic review reported on multiple prediction models that can be used in medical ED patients to predict short-term mortality [Citation16].

Although a wide array of prediction models have been developed and are used in different settings to detect patients who are at risk of poor outcome, the models are not always externally validated. Moreover, few studies have been performed to directly compare these models. Directly comparing prediction models is important to determine which models are most reliable and useful to optimize decision-making in the ED, since it is difficult to implement several models at the same time. Therefore, in this retrospective study, we aimed to provide a head-to-head comparison of the discriminatory performance to predict short term outcomes of previously developed prediction models in medical ED patients. In addition, in a selection of clinical prediction models with the highest discriminatory performance, we searched for clinically relevant cut-off values.

Materials and methods

Study design and setting

This retrospective cohort study was performed at the ED of the Maastricht University Medical Center + (MUMC+). This is a combined secondary/tertiary care centre in the Netherlands, with 22,000 ED visits every year. In contrast to many other countries where patients can visit the ED without referral by a health professional (open access ED), nearly all ED patients in the Netherlands are referred after an initial triage process. The medical ethics committee of the MUMC + approved this study (METC 2018-0838) and waived the requirement of informed consent. This study was conducted and reported in accordance with the STROBE guidelines (Strengthening the Reporting of Observation studies in Epidemiology) (Supplementary Table 1) [Citation17].

Prediction models

We conducted a PubMed search to identify relevant prediction models with a combination of methodological search terms (prognosis, prediction, risk stratification, score, model, outcome) and emergency medicine related search terms (emergency department, emergency care, mortality) (Supplementary Figure 1). Identified studies were reviewed for inclusion based on title and abstract. In addition, we checked the reference lists of the manuscripts we found in this manner. Furthermore, we used a recent systematic review of the literature on prediction models in the ED [Citation16]. The search was restricted to studies that were written in English or Dutch, and performed on adults.

We selected prediction models based on their feasibility and aims. In other words, we selected scores consisting of items that are readily available in the ED and aim to predict poor outcome (i.e. mortality, admission to ICU, hospital admission). We excluded models that were not feasible in an ED setting (e.g. models that include items that are not routinely registered, such as functional state). Models developed using machine learning techniques and radiologic scores were also excluded because these could not be reproduced in our ED setting. Prediction models were also excluded if the included items or the risk calculation were unclear. Trauma scores were excluded because we aimed to include non-trauma patients only.

Study sample

The study sample consisted of consecutive patients who visited our ED in two periods of a total of eight months between September 2020 and November 2021, and who were treated by an internist. Patients were included if they met the following criteria: adult ED patients (>18 years old) who were registered for treatment by an internist. In our ED, internists and their residents attend all patients who are referred for internal medicine or gastroenterology, as well as non-differentiated non-trauma patients. Patients were excluded if they revisited the ED after an earlier ED presentation during the study period, since their revisit was included in the follow-up period of the first ED visit. Patients in whom insufficient data were available were also excluded (e.g. patients who visited the ED because of needle stick injury, or patients whose medical records were not accessible).

Data collection

Data collection was performed by medical students and resident doctors. To ensure data quality, a sample of the collected data (approximately 10%) was double checked and discrepancies were resolved through discussion with a second reviewer. From the ED charts and the electronic medical records, we collected data on age, sex, comorbidity based on the Charlson Comorbidity Index (CCI), triage category based on the Manchester Triage System (MTS) and the reason for ED visit (according to the international classification of diseases (ICD) 10 system) [Citation18–20]. We retrieved the following vital signs: heart rate (HR), systolic blood pressure (SBP), diastolic blood pressure (DBP), mean arterial pressure (MAP), respiratory rate (RR), oxygen saturation, temperature and Glasgow Coma Scale (GCS). For each vital sign, we used the initial (i.e. first recorded) value during the ED visit. The Alert Verbal Pain Unresponsive (AVPU) scale was derived from the GCS [Citation21]. If RR or GCS were missing, we used paCO2 and descriptions in the medical records to deduce these values, similar to other studies [Citation15,Citation22,Citation23]. Furthermore, we collected the results of routinely performed laboratory tests: haemoglobin, haematocrit, leukocytes, thrombocytes, sodium, potassium, blood urea nitrogen (BUN), creatinine, lactate dehydrogenase (LDH), bilirubin, C-reactive protein (CRP), albumin, lactate, glucose, pH and pO2. If haematocrit and pO2 values were missing, we used haemoglobin and oxygen saturation to calculate these values, similar to other studies [Citation23–26].

Finally, we retrieved data on admission, length of hospital stay, prolonged length of hospital stay (>7 days), treatment restrictions, admission to intensive care unit (ICU), and 1-day, 7-day and 31-day mortality. Data on mortality were verified using the medical records. In the Netherlands, all deaths are registered by the municipal administration office, and these data are linked to the medical records. The collected data were anonymized and the dataset contained no identifying values that could link the information to an individual patient.

Outcomes

The primary outcome for assessing the discriminatory performance of the selected models was all-cause mortality within 31 days of ED presentation. The secondary outcomes were 1-day mortality, 7-day mortality and a composite outcome of 31-day mortality and/or ICU admission.

Statistical analysis

Regarding sample size, in order to perform external validation of prediction models, we aimed to comply with the rule of thumb to include approximately 100 patients who met the primary outcome, similar to other studies [Citation27]. Assuming a 31-day mortality of 5%, we calculated a required sample size of 2000 patients.

Baseline characteristics were analysed using descriptive statistics and for mortality, a survival curve was plotted. For each patient, we completed variables of the included prediction models. Because, unlike in many other countries, our ED is not an open access ED, we performed a subgroup analysis in patients who were admitted to the hospital to increase comparability with the situation in other countries, assuming that criteria for admission are more similar. The discriminatory performance of the prediction models was assessed by calculating the area under the receiver operating characteristic curve (AUC). An AUC of 0.5 corresponds with very poor discriminatory performance, whereas an AUC of 1.0 means perfect accuracy. We used stochastic regression imputation with predictive mean matching when a score could be completed in less than 95% of patients due to missing values. A sensitivity analysis was performed assuming that missing values were normal values. We compared the AUCs of the included models using the method of DeLong.

The prediction models with the highest discriminatory performance were subjected to further analysis. To check the accuracy of these models, calibration was assessed by visually inspecting the calibration plots. We also determined cut-off values for the prediction models with the highest discriminatory performance in order to guide clinical decision-making by identifying patients with low and high risk of adverse outcome. To achieve this, we calculated sensitivity, specificity, positive and negative predictive values for several cut-off values.

All data were analysed using IBM SPSS Statistics for Windows, IBM Corporation, Armonk NY, version 25.0. DeLong tests were performed in R, version 4.0.0.

Results

Prediction models

In total, we included 19 prediction models (): Coronavirus Clinical Characterisation Consortium (4 C) mortality score, Acute Physiology and Chronic Health Evaluation II (APACHE II), Asadollahi et al. model, COVID-19 Outcome Prediction in the ED (COPE), CURB-65, Early Warning Score (EWS), Modified Early Warning Score (MEWS), Modified Shock Index (MSI), National Early Warning Score (NEWS), Point Of Care Test model (POCT), Rapid Acute Physiology Score (RAPS), Rapid Emergency Medicine Score (REMS), RISE UP score, Prytherch et al. score, qSOFA score, SOFA score, Sepsis Patient Evaluation in the ED (SPEED) score, Simple Sepsis Early Prognostic Score (SSEPS) and Worthing Physiological Scoring System (WPSS) [Citation13–15,Citation28–43]. A more detailed overview of the included prediction models is available in Supplementary Table 2.

Table 1. Overview of included prediction models.

Study sample

During the study period, 2835 ED patients were screened for inclusion. After exclusion of 416 patients who revisited the ED during the study period and 234 patients in whom insufficient data were available (e.g. patients who visited the ED because of needle stick injury, or patients whose medical records were not accessible), we included 2185 patients for analysis (). The median age was 67 years (IQR 53-78), and 1192 patients (54.6%) were male. The majority of patients (61.8%) were triaged as urgent (yellow, orange or red, according to MTS). In total, 1433 patients (65.6%) were admitted to the hospital. The most common reason for the ED visit and admission to the hospital was infection. The median length of hospital stay was 5 days (IQR 2-10) and a total of 564 patients (25.9%) were admitted longer than 7 days.

Table 2. Characteristics of the study sample.

In our sample, 231 patients (10.6%) died within 31 days after the ED visit, 31 patients (1.4%) died within 1 day, 97 patients died within 7 day (4.4%) and 154 patients (7.0%) were admitted to ICU. The survival curve is shown in Supplementary Figure 2. A total of 331 patients (15.1%) met the composite endpoint of 31-day mortality and/or ICU admission.

The 1433 patients who were admitted to the hospital, were older and had more comorbidities that the patients who were not admitted. In this subgroup, mortality and the number of ICU admission were also higher.

Discriminatory performance and calibration for 31-day mortality

We applied the prediction models to every patient in the study sample ( and ). The RISE UP and COPE score showed the highest discriminatory performance and both yielded an AUC of 0.86 (95% CI: 0.84-0.89) for 31-day mortality and an AUC of 0.81 (95% CI: 0.78-0.83) for the composite endpoint. The RISE UP score showed excellent calibration (). The COPE score was also well calibrated, but the calibration plot showed a slight underestimation of 31-day mortality by the model and a slope of >1.

Figure 1. Receiver operating characteristics (ROC) curves for predicting 31-day mortality.

Figure 1. Receiver operating characteristics (ROC) curves for predicting 31-day mortality.

Figure 2. Calibration plots of the predicted 31-day mortality (x axis) using the RISE UP score (left panel) and the COPE score (right panel). the RISE UP score shows excellent calibration. The COPE score is also well calibrated, but shows average underestimation of 31- day mortality and a slope of >1.

Figure 2. Calibration plots of the predicted 31-day mortality (x axis) using the RISE UP score (left panel) and the COPE score (right panel). the RISE UP score shows excellent calibration. The COPE score is also well calibrated, but shows average underestimation of 31- day mortality and a slope of >1.

Table 3. Comparison of discriminatory performance of the included prediction models.

In comparison, the other prediction models yielded AUCs ranging from 0.59 to 0.81 for 31-day mortality and AUCs ranging from 0.62 to 0.78 for the composite endpoint. The discriminatory performance of the RISE UP score and the COPE score was significantly better than that of the other models (). We decided to use the RISE UP score as a reference for comparison with other prediction models, as this model showed better calibration.

Discriminatory performance for shorter term mortality

The RISE UP and COPE score also showed high discriminatory performance with regard to mortality within 1 and 7 days, and yielded an AUC of 0.87 (95% CI: 0.81-0.93) and 0.87 (95% CI: 0.81-0.92) for 1-day mortality, and an AUC of 0.87 (95% CI: 0.84-0.90) and 0.87 (95% CI: 0.85-0.90) for 7-day mortality, respectively (). Some of the other prediction models (i.e. APACHE II, NEWS, WPSS, MEWS, EWS and SOFA) showed higher discriminatory performance for 1-day and 7-day mortality than for 31-day mortality.

Table 4. Comparison of discriminatory performance of the included prediction models in 1-day and 7-day mortality.

Because of missing values (vital signs and laboratory tests), most prediction models could not be calculated in all patients (Supplementary Table 2). Exceptions were four scores (SSEPS, qSOFA, MSI and RAPS), which could be calculated in more than 95% of patients. Missing values were less common in patients who were admitted to the hospital than in those who were not admitted. However, we found no relevant difference in discriminatory performance when performing a subgroup analysis in admitted patients only (Supplementary Table 3). In a sensitivity analysis, we found no difference in discriminatory performance when missing values were assumed to be normal (i.e. within the normal range) (Supplementary Table 4).

Determination of clinical cut-off values for 31-day mortality

Since the RISE UP score and the COPE score showed the best discriminatory performance to predict 31-day mortality, we determined clinical cut-off values for these two models. We defined the optimal cut-off value from a clinical point of view, i.e. we searched for a low cut-off value with an optimal sensitivity and negative predictive value (NPV) to rule out poor outcome, and for a high cut-off value with an optimal specificity and positive predictive value (PPV) to rule in poor outcome.

Patients with a RISE UP score <5% were found to be at (very) low risk of 31-day mortality (). In this subgroup of 1149 patients (52.6%), there were 10 deaths (0.9%), yielding a NPV of 99.1% (95%CI: 98.4–99.5). In addition, only 48 patients in this subgroup (4.2%) met the composite endpoint, yielding a NPV of 95.8% (95% CI: 94.7–96.8) (). Patients with a RISE UP score >20% were at high risk of a poor outcome. In this subgroup of 322 patients (14.7%), we found a 31-day mortality of 38.8%, and 46.6% of patients met the composite endpoint. In our cohort, 714 patients (32.7%) had a RISE UP score between 5 and 20%, with a PPV of 13.5% and a NPV of 90.8%.

Table 5. Cut off values of the RISE UP and COPE score and prognostic accuracy for 31-day mortality.

Table 6. Cut-off values of the RISE UP and COPE score and prognostic accuracy for 31-day mortality or ICU admission.

Patients with a COPE score <5% were found to be at (very) low risk of mortality as well ( and ). In this subgroup of 1211 patients (55.4%), there were 19 deaths (1.6%), yielding a NPV of 98.4% (95%CI: 97.6–99.0). Only 59 patients in this subgroup (4.9%) met the composite endpoint, yielding a NPV of 95.2% (95% CI: 93.9–96.1). Patients with a COPE score >20% were at high risk of a poor outcome. In this subgroup of 309 patients (14.1%), we found a 31-day mortality of 40.5%, and 49.1% of patients met the composite endpoint. In our cohort, 665 patients (30.4%) had a COPE score between 5 and 20%, with a PPV of 13.1% and a NPV of 90.5%.

Discussion

In this retrospective study, we externally validated 19 prediction models for their ability to predict mortality or ICU admission in a large cohort of medical ED patients. In our cohort, both the rate of admission (65.6%) and 31-day mortality (10.6%) were relatively high. The RISE UP score and the COPE score showed the highest discriminatory performance. Both models yielded high AUCs for 31-day mortality (both AUC of 0.86), 1-day mortality (both AUC of 0.87), 7-day mortality (both AUC of 0.87) and the composite endpoint of 31-day mortality and/or ICU admission (both AUC of 0.81). We decided to use the RISE UP score as a reference for comparison with other prediction models, as this model showed better calibration and has already been externally validated [Citation15,Citation23]. We found that, for both the RISE UP and COPE score, patients with a score <5% (half of the cohort) had favorable outcomes whereas patients with a score >20% (about 14% of the cohort) were at high risk of adverse outcomes.

While several of the 17 other prediction models (i.e. APACHE II, NEWS, WPSS, MEWS, EWS and SOFA) showed high discriminatory performance with similar AUCs to the RISE UP and COPE scores for short term mortality (1-day and 7-day mortality), the RISE UP and COPE scores showed high performance both for 31-day mortality and shorter term mortality. For 31-day mortality and the composite endpoint, the 17 other prediction models showed significantly lower discriminatory performance. The 4C mortality, Prytherch et al. CURB-65, Asadollahi et al. APACHE II, NEWS, SSEPS, WPSS, POCT, MEWS, EWS and REMS scores showed AUCs ranging from 0.71 to 0.81. The MSI, qSOFA, SOFA, SPEED and RAPS score showed AUCs ranging from 0.59 to 0.69. Only the SSEPS, qSOFA, MSI and RAPS could be calculated in more than 95% of patients. This proportion was lower in all other prediction models.

Prediction models

In our study, we found two prediction models with a higher discriminatory performance than the other models. The RISE UP score was developed by some members of our team to predict 30-day mortality in older medical ED patients [Citation15]. Earlier validation studies found good to very good discriminatory performance in older ED patients (AUC 0.83) and in ED patients with COVID-19 (AUC 0.77 and 0.83) [Citation15,Citation23,Citation26]. The COPE score was developed to predict 28-day mortality and the need for ICU admission in ED patients with suspected or confirmed COVID-19 with AUCs ranging from 0.79 to 0.83 31. In our study, both scores also worked very well in our cohort of medical ED patients (AUC 0.86-0.87), which also included 314 patients with COVID-19 (14.4%). The scores were both found to be applicable to 1-day, 7-day and 31-day mortality and to both sides of the clinical spectrum (both the severely ill and the not severely ill). In our cohort, more than half of the patients had a RISE UP score and COPE score <5%, and these patients had a very low risk of short-term mortality or ICU admission. In these low risk patients, the clinician can safely choose to refer the patient for outpatient treatment or to discharge at an early stage. On the other hand, patients with a RISE UP or COPE score >20% had a high risk of adverse outcome. The comparably good performance of both the RISE UP and COPE score may be explained because these models include almost similar items that are known to reflect the severity of illness in ED patients (i.e. abnormal vital signs, LDH, BUN, Bilirubin, CRP). In theory, the items of both models are readily available during the ED visit. Using these models, the probability of a poor outcome can therefore be predicted in the first hours of the ED visit. Furthermore, both scores can be calculated using an online calculator (https://jscalc.io/calc/c4zL9aYdhMf7bjJb (RISE UP score), and https://www.cope2.nl/#/calculator (COPE score)). Prediction models that have good discriminatory performance and that can be computed easily and quickly are of great value to guide clinical decision-making.

Here we will review several of the other prediction models that also consist of a combination of vital signs and routine laboratory tests. The 4C mortality score was developed to predict in-hospital mortality in patients with COVID-19. It consists of eight items that are readily available in the ED [Citation28]. In a previous study, the AUC was similar to the AUC we found (0.81 versus 0.79). Therefore, the 4C mortality may also be used in non-COVID patients. Prytherch et al. also developed a model using routine laboratory and administrative data to predict in-hospital mortality with an AUC of 0.78 [Citation37]. However, in an Iranian validation study, the AUC of this model was only 0.66 [Citation44]. The difference may be explained by the inclusion of patients that were more severely ill (in-hospital mortality 19% versus 31-day mortality 10.6% in our study). The CURB-65 score is used to assess severity and predict mortality in patients with community-acquired pneumonia [Citation13]. In our cohort, we found a moderate to good AUC of 0.76. In other studies in ED patients, the CURB-65 score was found to have comparable AUCs ranging from 0.77 to 0.79 [Citation22,Citation45,Citation46]. Asadollahi et al. developed a model using routine laboratory tests to predict mortality in ED patients [Citation30]. The discriminatory performance for 31-day mortality of this model in our cohort was good (AUC 0.76), which was lower than in the original study (AUC 0.85), but similar to the AUC in a recent Dutch study (AUC 0.80) [Citation40]. The SPEED score was developed to predict 28-day mortality in the ED and had a good predictive value in the original study (AUC 0.81) [Citation41]. In our study, we found a moderate AUC of 0.68, which may be explained by the lower mortality in our cohort (10.6% versus 30%). The SPEED score was less feasible, because in our ED, an arterial blood gas analysis is performed on indication only (in one third of our patients, an arterial blood gas was measured). The POCT model and SSEPS were also developed to predict mortality in ED patients [Citation40]. One of the merits of the POCT model is that it uses only items that are available within a few minutes (i.e. through point of care testing). The SSEPS had a high feasibility in our ED setting. Both the POCT model and SSEPS had good predictive value (AUC 0.80 and 0.76) in the original studies [Citation40,Citation42]. To our knowledge, we are the first to externally validate the POCT model and SSEPS and found a moderate to good AUC of 0.72 and 0.75, respectively.

The SOFA and APACHE II scores aim to predict mortality specifically in ICU patients. In previous studies, the AUC of the APACHE II score ranged from 0.78 to 0.81, which is higher than the AUC we found (AUC 0.75) [Citation47,Citation48]. The SOFA score showed AUCs ranging from 0.74 to 0.75 in previous studies, which is also higher than the AUC we found (AUC 0.67) [Citation14,Citation49]. However, only 7.0% of the patients in our cohort were admitted to ICU, which means that our population was more heterogeneous and mortality was probably more difficult to predict in ED patients using these models designed for ICU patients. Both the APACHE II and SOFA score showed higher discriminatory performance for 1-day mortality than for 31-day mortality. The APACHE II score was also less feasible in an ED setting, because in our ED, an arterial blood gas analysis is performed on indication only.

The other prediction models were specifically designed for early detection of high-risk patients by assigning points to vital signs only, the so called early warning scores. In our cohort, the NEWS, WPSS, MEWS and EWS showed high discriminatory performance in 1-day (and 7-day) mortality (AUC ranging from 0.82 to 0.86), and only moderate to good performance for 31-day mortality (AUC ranging from 0.71 to 0.75). Therefore, these scores seem ideally suited for use in triage and prehospital care [Citation50]. Since these scores consist largely of a (more or less) complete set of vital signs, they can be easily applied and have high feasibility. The MSI, RAPS and qSOFA consist of a limited number of vital signs only (e.g. the MSI is calculated by dividing MAP by HR), and showed poor discriminatory performance in our study (AUC ranging from 0.59 to 0.69). In an Iranian study, in which WPSS, RAPS, REMS and MEWS were compared, the WPSS showed the highest performance (AUC 0.73), followed by MEWS (AUC 0.70), REMS (AUC 0.68), and RAPS (AUC 0.66), which is similar to our study [Citation44]. The MSI, RAPS and qSOFA had poor performance, similar to other studies [Citation38,Citation51,Citation52]. A narrative review compared several early warning scores (EWS, MEWS, NEWS) [Citation53]. The NEWS was found to be the most accurate early warning score in the general ED population, similar to our study (AUC of 0.75). However, no single early warning score can reliably be used to detect all patients at high risk of adverse outcome.

Study limitations

Our study had several limitations. First, our study was performed in a single medical centre, limiting the generalizability of the results. However, our cohort of medical ED patients was relatively large and the follow-up of all patients was complete. By validating all prediction models in the same cohort, there are no differences in the patient sample, and we could truly compare the scores [Citation54]. Second, despite our thorough search, it is possible that not all applicable prediction models were retrieved for our analysis. We chose prediction models that were feasible in our ED setting, which may be different for other EDs. Third, we recognize that we validated some prediction models in a different population or setting than in which they have been developed (i.e. ICU patients, transport setting). However, because there is an abundance of prediction models for different populations, conditions and settings, it is hard to implement multiple models at the same time. Therefore, we tried to find prediction models that are useful in a population of medical ED patients. Fourth, many prediction models in our study could not be completed in about 20% of the patients because of missing values. However, we found that missing values mainly occurred in patients who were not admitted to the hospital. In addition, one could assume that when a value is missing, the medical team in the ED saw no indication to determine that value, and that a missing value therefore probably equals a normal value. Therefore, we decided to perform a sensitivity analysis of the prediction models assuming that missing value were normal and found no differences in discriminatory per­formance. Consequently, we conclude that these prediction models are also useful in case of missing values. Nevertheless, missing values are an important limitation of a retrospective study and prospective studies are needed to truly compare these models. Fifth, it is likely that the AUC of a prediction model is influenced by the specific moment at which the score is calculated. Since ED physicians are used to administer prompt treatment to patients with abnormal vital signs, this influences (i.e. improves) the prognosis of the patient and thereby decreases the AUC of early warning scores. However, in retrospective studies it is not possible to properly assess the size of this effect, and the same phenomenon may be true for scores that include laboratory values. Last, in a subgroup of patients with pre-existing frailty or severe comorbidity, it was decided to initiate conservative care only (27.6% had treatment restrictions). As these decisions affect both mortality and the likelihood of ICU referral, and may differ in other countries, we decided to study ICU admissions as a composite outcome only.

Conclusion

In conclusion, the RISE UP and COPE score had the highest discriminatory performance for short-term outcome in medical ED patients and are able to identify ED patients both at low and high risk for poor outcomes. They may therefore assist in guiding clinical decision-making patients when allocating healthcare resources in a high-risk environment such as the ED. However, further research on the impact of the implementation of these prediction models in an ED setting needs to be conducted.

Author contributors

PD and PMS collected clinical data with the help of medical students. PD performed the statistical analysis. All authors interpreted data. PD and SL drafted the first version of the manuscript. NZ, WD, SM, JC and PMS critically reviewed the manuscript. All authors have read and approved the final version of the manuscript.

Supplemental material

Supplemental Material

Download MS Word (141 KB)

Acknowledgments

We are grateful for the help from medical students A. van de Koolwijk, C. Hovens, M. Ronner, M. van Roosendaal, S. Cimino, J. Weijers and J. van Welij in collecting the data necessary to perform this study.

Disclosure statement

The RISE UP score was developed by members of our team. These team members (NZ and PMS) were not involved in data analysis for the current study.

Data availability statement

Additional data are available upon reasonable request.

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

References

  • Laxmisan A, Hakimzada F, Sayan OR, et al. The multitasking clinician: decision-making and cognitive demand during and after team handoffs in emergency care. Int J Med Inform. 2007;76(11-12):1–12. doi: 10.1016/j.ijmedinf.2006.09.019.
  • van Geffen M, van der Waaij KM, Stassen PM. Number, nature & impact of incoming telephone calls on residents and their work during evening shifts. Acute Med. 2022;21(1):5–11. doi: 10.52964/AMJA.0886.
  • van der Linden N, van der Linden MC, Richards JR, et al. Effects of emergency department crowding on the delivery of timely care in an inner-city hospital in The Netherlands. Eur J Emerg Med. 2016;23(5):337–343. doi: 10.1097/MEJ.0000000000000268.
  • Ter Avest E, Onnes BT, van der Vaart T, et al. Hurry up, it’s quiet in the emergency department. Neth J Med. 2018;76(1):32–35.
  • van der Linden C, Reijnen R, Derlet RW, et al. Emergency department crowding in The Netherlands: managers’ experiences. Int J Emerg Med. 2013;6(1):41. doi: 10.1186/1865-1380-6-41.
  • Liew D, Liew D, Kennedy MP. Emergency department length of stay independently predicts excess inpatient length of stay. Med J Aust. 2003;179(10):524–526. doi: 10.5694/j.1326-5377.2003.tb05676.x.
  • Sun BC, Hsia RY, Weiss RE, et al. Effect of emergency department crowding on outcomes of admitted patients. Ann Emerg Med. 2013;61(6):605–611 e6. doi: 10.1016/j.annemergmed.2012.10.026.
  • Burke LG, Epstein SK, Burke RC, et al. Trends in mortality for medicare beneficiaries treated in the emergency department From 2009 to 2016. JAMA Intern Med. 2020;180(1):80–88. doi: 10.1001/jamainternmed.2019.4866.
  • Jones S, Moulton C, Swift S, et al. Association between delays to patient admission from the emergency department and all-cause 30-day mortality. Emerg Med J. 2022;39(3):168–173. doi: 10.1136/emermed-2021-211572.
  • Baker M, Clancy M. Can mortality rates for patients who die within the emergency department, within 30 days of discharge from the emergency department, or within 30 days of admission from the emergency department be easily measured? Emerg Med J. 2006;23(8):601–603. doi: 10.1136/emj.2005.028134.
  • van Doorn W, Stassen PM, Borggreve HF, et al. A com­parison of machine learning models versus clinical evaluation for mortality prediction in patients with sepsis. PLoS One. 2021;16(1):e0245157. doi: 10.1371/journal.pone.0245157.
  • Gill TM. The Central role of prognosis in clinical decision making. JAMA. 2012;307(2):199–200. doi: 10.1001/jama.2011.1992.
  • Lim WS, van der Eerden MM, Laing R, et al. Defining community acquired pneumonia severity on presentation to hospital: an international derivation and validation study. Thorax. 2003;58(5):377–382. doi: 10.1136/thorax.58.5.377.
  • Vincent JL, Moreno R, Takala J, et al. The SOFA (sepsis-related organ failure assessment) score to describe organ dysfunction/failure. On behalf of the working group on sepsis-related problems of the european society of intensive care medicine. Intensive Care Med. 1996;22(7):707–710. doi: 10.1007/BF01709751.
  • Zelis N, Buijs J, de Leeuw PW, et al. A new simplified model for predicting 30-day mortality in older medical emergency department patients: the rise up score. Eur J Intern Med. 2020;77:36–43. doi: 10.1016/j.ejim.2020.02.021.
  • Brink A, Alsma J, Fortuin AW, et al. Prediction models for mortality in adult patients visiting the emergency department: a systematic review. Acute Med J. 2019;18(3):171–183. doi: 10.52964/AMJA.0771.
  • Cuschieri S. The STROBE guidelines. Saudi J Anaesth. 2019;13(Suppl 1):S31–S34. doi: 10.4103/sja.SJA_543_18.
  • Cooke MW, Jinks S. Does the Manchester triage system detect the critically ill? J Accid Emerg Med. 1999;16(3):179–181. doi: 10.1136/emj.16.3.179.
  • Charlson ME, Pompei P, Ales KL, et al. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373–383. doi: 10.1016/0021-9681(87)90171-8.
  • World Health Organization. The ICD-10 classification of mental and behavioural disorders: diagnostic criteria for research. Genève (Switzerland): World Health Organization; 1993.
  • Janagama SR, Newberry JA, Kohn MA, et al. Is AVPU comparable to GCS in critical prehospital decisions? - A cross-sectional study. Am J Emerg Med. 2022;59:106–110. doi: 10.1016/j.ajem.2022.06.042.
  • Howell MD, Donnino MW, Talmor D, et al. Performance of severity of illness scoring systems in emergency department patients with infection. Acad Emerg Med. 2007;14(8):709–714. doi: 10.1197/j.aem.2007.02.036.
  • van Dam P, Zelis N, van Kuijk SMJ, et al. Performance of prediction models for short-term outcome in COVID-19 patients in the emergency department: a retrospective study. Ann Med. 2021;53(1):402–409. doi: 10.1080/07853890.2021.1891453.
  • Kokholm G. Simultaneous measurements of blood pH, pCO2, pO2 and concentrations of hemoglobin and its derivates–a multicenter study. Scand J Clin Lab Invest Suppl. 1990;203:75–86. doi: 10.3109/00365519009087494.
  • Madan A. Correlation between the levels of SpO2and PaO2. Lung India. 2017;34(3):307–308. doi: 10.4103/lungindia.lungindia_106_17.
  • van Dam PM, Zelis N, Stassen P, et al. Validating the RISE UP score for predicting prognosis in patients with COVID-19 in the emergency department: a retrospective study. BMJ Open. 2021;11(2):e045141. doi: 10.1136/bmjopen-2020-045141.
  • Collins GS, Ogundimu EO, Altman DG. Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat Med. 2016;35(2):214–226. doi: 10.1002/sim.6787.
  • Knight SR, Ho A, Pius R, et al. Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO clinical characterisation protocol: development and validation of the 4C mortality score. BMJ. 2020;370:m3339.
  • Knaus WA, Draper EA, Wagner DP, et al. APACHE II: a severity of disease classification system. Crit Care Med. 1985;13(10):818–829. doi: 10.1097/00003246-198510000-00009.
  • Asadollahi K, Hastings IM, Gill GV, et al. Prediction of hospital mortality from admission laboratory data and patient age: a simple model. Emerg Med Australas. 2011;23(3):354–363. doi: 10.1111/j.1742-6723.2011.01410.x.
  • van Klaveren D, Rekkas A, Alsma J, et al. COVID outcome prediction in the emergency department (COPE): using retrospective dutch hospital data to develop simple and valid models for predicting mortality and need for intensive care unit admission in patients who present at the emergency department with suspected COVID-19. BMJ Open. 2021;11(9):e051468. doi: 10.1136/bmjopen-2021-051468.
  • Duckitt RW, Buxton-Thomas R, Walker J, et al. Worthing physiological scoring system: derivation and validation of a physiological early-warning system for medical admissions. An observational, population-based single-Centre study. Br J Anaesth. 2007;98(6):769–774. doi: 10.1093/bja/aem097.
  • Subbe CP, Kruger M, Rutherford P, et al. Validation of a modified early warning score in medical admissions. QJM. 2001;94(10):521–526. doi: 10.1093/qjmed/94.10.521.
  • Liu YC, Liu JH, Fang ZA, et al. Modified shock index and mortality rate of emergency patients. World J Emerg Med. 2012;3(2):114–117. doi: 10.5847/wjem.j.issn.1920-8642.2012.02.006.
  • Royal College of Physicians (RCP). National Early Warning Score (NEWS): standardising the assessment of acute illness severity in the NHS. Report of a Working Party. London: Royal College of Physicians; 2012.
  • Rhee KJ, Fisher CJ, Jr., Willitis NH. The rapid acute physiology score. Am J Emerg Med. 1987;5(4):278–282. doi: 10.1016/0735-6757(87)90350-0.
  • Prytherch DR, Sirl JS, Schmidt P, et al. The use of routine laboratory data to predict in-hospital death in medical admissions. Resuscitation. 2005;66(2):203–207. doi: 10.1016/j.resuscitation.2005.02.011.
  • Seymour CW, Liu VX, Iwashyna TJ, et al. Assessment of clinical criteria for sepsis: for the third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA. 2016;315(8):762–774. doi: 10.1001/jama.2016.0288.
  • Olsson T, Terent A, Lind L. Rapid emergency medicine score: a new prognostic tool for in-hospital mortality in nonsurgical emergency department patients. J Intern Med. 2004;255(5):579–587. doi: 10.1111/j.1365-2796.2004.01321.x.
  • Brink A, Schuttevaer R, Alsma J, et al. Predicting 30-day mortality using point-of-care testing; an external validation and derivation study. PLoS One. 2020;15(9):e0239318. doi: 10.1371/journal.pone.0239318.
  • Bewersdorf JP, Hautmann O, Kofink D, et al. The SPEED (sepsis patient evaluation in the emergency department) score: a risk stratification and outcome prediction tool. Eur J Emerg Med. 2017;24(3):170–175. doi: 10.1097/MEJ.0000000000000344.
  • Liu B, Li D, Cheng Y, et al. Development and internal validation of a simple prognostic score for early sepsis risk stratification in the emergency department. BMJ Open. 2021;11(7):e046009. doi: 10.1136/bmjopen-2020-046009.
  • Morgan R, Lloyd-Williams F, Wright MM, et al. An early warning score system for detecting developing critical illness. Clin Intens Care. 1997;8:100.
  • Rahmatinejad Z, Tohidinezhad F, Rahmatinejad F, et al. Internal validation and comparison of the prognostic performance of models based on six emergency scoring systems to predict in-hospital mortality in the emergency department. BMC Emerg Med. 2021;21(1):68. doi: 10.1186/s12873-021-00459-7.
  • Roest AA, Tegtmeier J, Heyligen JJ, et al. Risk stratification by abbMEDS and CURB-65 in relation to treatment and clinical disposition of the septic patient at the emergency department: a cohort study. BMC Emerg Med. 2015;15(1):29. doi: 10.1186/s12873-015-0056-z.
  • Hilderink MJ, Roest AA, Hermans M, et al. Predictive accuracy and feasibility of risk stratification scores for 28-day mortality of patients with sepsis in an emergency department. Eur J Emerg Med. 2015;22(5):331–337. doi: 10.1097/MEJ.0000000000000185.
  • Czajka S, Ziębińska K, Marczenko K, et al. Validation of APACHE II, APACHE III and SAPS II scores in in-hospital and one year mortality prediction in a mixed intensive care unit in Poland: a cohort study. BMC Anesthesiol. 2020;20(1):296. doi: 10.1186/s12871-020-01203-7.
  • Capuzzo M, Valpondi V, Sgarbi A, et al. Validation of severity scoring systems SAPS II and APACHE II in a single-center population. Intensive Care Med. 2000;26(12):1779–1785. doi: 10.1007/s001340000715.
  • Wełna M, Adamik B, Goździk W, et al. External validation of the sepsis severity score. Int J Immunopathol Pharmacol. 2020;34:2058738420936386. doi: 10.1177/2058738420936386.
  • Martin-Rodriguez F, Sanz-Garcia A, Ortega GJ, et al. Tracking the National Early Warning Score 2 from prehospital care to the emergency department: a prospective, ambulance-based, observational study. Prehosp Emerg Care. 2022;27(1):75–83.
  • Brabrand M, Hallas P, Hansen SN, et al. Using scores to identify patients at risk of short term mortality at arrival to the acute medical unit: a validation study of six existing scores. Eur J Intern Med. 2017;45:32–36. doi: 10.1016/j.ejim.2017.09.042.
  • Kim SY, Hong KJ, Shin SD, et al. Validation of the shock index, modified shock index, and age shock index for predicting mortality of geriatric trauma patients in emergency departments. J Korean Med Sci. 2016;31(12):2026–2032. doi: 10.3346/jkms.2016.31.12.2026.
  • Nannan Panday RS, Minderhoud TC, Alam N, et al. Prognostic value of early warning scores in the emergency department (ED) and acute medical unit (AMU): A narrative review. Eur J Intern Med. 2017;45:20–31. doi: 10.1016/j.ejim.2017.09.027.
  • Keuning BE, Kaufmann T, Wiersema R, et al. Mortality prediction models in the adult critically ill: a scoping review. Acta Anaesthesiol Scand. 2020;64(4):424–442. doi: 10.1111/aas.13527.