2,618
Views
16
CrossRef citations to date
0
Altmetric
Short Report

Proposed clinical indicators for efficient screening and testing for COVID-19 infection using Classification and Regression Trees (CART) analysis

, , , , , , , & show all
Pages 1109-1112 | Received 15 Jun 2020, Accepted 03 Sep 2020, Published online: 20 Oct 2020

ABSTRACT

The introduction and rapid transmission of SARS-CoV-2 in the United States resulted in methods to assess, mitigate, and contain the resulting COVID-19 disease derived from limited knowledge. Screening for testing has been based on symptoms typically observed in inpatients, yet outpatient symptoms may differ. Classification and regression trees recursive partitioning created a decision tree classifying participants into laboratory-confirmed cases and non-cases. Demographic and symptom data from patients ages 18–87 years enrolled from March 29–June 8, 2020 were included. Presence or absence of SARS-CoV-2 was the target variable. Of 832 tested, 77 (9.3%) tested positive. Cases significantly more often reported diarrhea (12 percentage points (PP)), fever (15 PP), nausea/vomiting (9 PP), loss of taste/smell (52 PP), and contact with a COVID-19 case (54 PP), but less frequently reported sore throat (−27 PP). The 4-terminal node optimal tree had sensitivity of 69%, specificity of 78%, positive predictive value of 20%, negative predictive value of 97%, and AUC of 76%. Among those referred for testing, negative responses to two questions could classify about half (49%) of tested persons with low risk for SARS-CoV-2 and would save limited testing resources. Outpatient symptoms of COVID-19 appear to be broader than the inpatient syndrome.

Initial supplies of anticipated COVID-19 vaccines may be limited and administration of first such available vaccines may need to be prioritized for essential workers, the most vulnerable, or those likely to have a robust response to vaccine. Another priority group could be those not previously infected. Those who screen out of testing may be less likely to have been infected by SARS-CoV-2 virus thus may be prioritized for vaccination when supplies are limited.

Introduction

The introduction of the SARS-CoV-2 virus into the United States (U.S.) was earlier than anticipated. Furthermore, the rapid transmission of SARS-CoV-2 resulted in implementation of methods to assess, mitigate and contain the resulting COVID-19 disease that were based on limited knowledge of its epidemiology. Given potentially insufficient testing supplies, but the need to rapidly identify cases, access to testing was based on symptoms typical of an acute respiratory infection, such as cough, fever, and shortness of breath, as has been reported to be emblematic of COVID-19 infection.Citation1,Citation2 A previous report in the U.S. noted that use of cough, fever, shortness of breath, and sore throat as the sole screening criteria for COVID-19 infection among health care personnel might have missed 17% of symptomatic cases.Citation3 Gastrointestinal and non-acute respiratory illness-typical symptoms have been noted among hospitalized COVID-19 patients.Citation2,Citation4 Because outpatient symptom complexes may differ from the better-known inpatient syndrome, a data-driven set of clinical indicators for COVID-19 would help to identify outpatient symptoms and those who would most benefit in situations of limited testing availability. Furthermore, identifying those who are less likely to have been exposed to the SARS-CoV-2 virus may simultaneously identify those who may respond well to a potential vaccine.

Methods

Symptomatic individuals who had either exposure to a case of COVID-19 or a typical respiratory illness symptom were scheduled at a centralized outpatient COVID-19 testing facility. Presence of SARS-CoV-2 was detected using highly sensitive RT-PCR testsCitation5 from nasopharyngeal specimens, which can detect 5 copies of virus. While at the clinic, testees received a flier describing this study. Following return of their results to the electronic medical record, a list of tested patients was sent to the research team daily, for contact by e-mail or telephone. Asymptomatic patients were not scheduled for testing at the time of this study’s data collection and were not included in the analyses. After electronic or verbal consent, participants completed a short enrollment survey either online or by phone, which included basic demographic information and symptoms at enrollment. Participants received a 20 USD debit card for participating in the study. The University of Pittsburgh IRB approved this study.

Data were captured in a RedCap data base and loaded into a SAS data set. Participants were grouped into COVID-19 cases and non-cases for comparison of demographic characteristics and symptoms. Descriptive statistics were computed for continuous data. Frequency distributions were estimated for categorical data. The appropriate parametric (e.g., t-test) or nonparametric test (e.g., chi-square, Wilcoxon tests) was used to assure a balanced distribution of the demographic and clinical characteristics among COVID-19 cases and non-cases using SAS version 9.4, Cary, NC. Significance was set at alpha = 0.05.

Classification and Regression Trees (CART) recursive partitioning, based on presence or absence of symptoms, was used to create a decision tree to correctly classify enrollees into laboratory-confirmed (RT-PCR) COVID-19 cases.Citation6

Symptoms reported at enrollment – fever, chills, cough, sore throat, shortness of breath, muscle aches, abdominal pain, nausea/vomiting, diarrhea, headache, decrease or loss of taste or smell, and contact with COVID-19 case – were included in the CART model as categorical predictors. Presence or absence of SARS-CoV-2 was the target variable. The tree was built on a ≥ 10% sample for the parent node and a stopping rule of ≥5% of the sample in the terminal node. A tenfold cross-validation method was used to evaluate reliability. To avoid overfitting, a maximum acceptable difference in risk between the pruned and the sub-tree of one standard error was used for tree pruning. Missing data were handled by surrogate splits. Hosmer-Lemeshow goodness of fit test confirmed the suitability of the trees. The area under the curve (AUC) for receiver operating characteristics, sensitivity, specificity, positive, and negative predictive values were assessed.

Results

Of 7,698 individuals tested for SARS-CoV-2 at this centralized center, overall, 361 (4.7%) were positive for the virus. Only adults ≥18 years of age were included. Among all adults tested, 832 (11.3%) enrolled in this study. Among enrollees, 77 (9.3%) were positive for SARS-CoV-2. No cases had tested negative within 14 days of receiving a positive result.

Among enrollees, COVID-19 cases were more likely to be nonwhite (black, other; 22.1%) than non-cases (11.7%; P = .009) and were less likely to smoke or vape (9.3% cases vs. 20.0% non-cases; P = .024), but did not differ by sex, age group or being a health care worker (). Those who tested positive significantly more frequently reported loss of taste/smell (75.9% vs. 23.6%; P < .001), diarrhea (49.4% vs. 37.7%; P = .046, fever (81.8% vs. 67.3%; P = .009), nausea/vomiting (18.4% vs. 9.7%; P = .018) and contact with a COVID-19 case (73.7% vs 20%; P < .001) (). Conversely, COVID-19 cases compared with non-cases less frequently reported sore throat (39.5% vs. 66.5%; P < .001).

Table 1. Baseline demographics by COVID-19 status

Table 2. Presenting symptoms for outpatients attending the COVID-19 testing clinic

The CART analyses resulted in a 4-terminal node optimal tree – one terminal node with a recommendation for testing, 1 node with a recommendation not to test and two nodes with a recommendation to consider testing (). This tree has a sensitivity of 69% and specificity of 78%, resulting in an AUC of 76%. The positive predictive value for this tree was 20% while the negative predictive value was 97%.

Figure 1. Decision tree for symptoms to determine testing needs for outpatient COVID-19

Figure 1. Decision tree for symptoms to determine testing needs for outpatient COVID-19

For screening of individuals with acute symptoms as a possible COVID-19 case, it is important to note the order of the questions as shown in the . If an individual first reports having been exposed to a COVID-19 case, there is a 21% likelihood of COVID-19 infection, and testing is recommended.

If the individual denies exposure to a COVID-19 case, then the next question should be, “Have you experienced a recent loss of taste or smell?” If the answer is negative, then the need for testing is minimal, as 98% of these individuals’ tests results will be negative for COVID-19, thus the recommendation would be not to test. A negative response to these two questions followed down the left side of the would allow almost half (49%; 406/832) of the participants to be ruled out as likely non-cases without testing.

There are two scenarios in which testing should be considered (aqua boxes in ). For instance, if the individual confirms loss of taste or smell, with or without report of sore throat, testing should be considered because a number of positives would be identified.

Discussion

CART decision tree analysis has been used to assist clinical decision making across a wide assortment of health conditions, their diagnosis or treatment, and outcome predictions.Citation7,Citation8,Citation9,Citation10,Citation11,Citation12Citation13 In rapid response to a pandemic novel coronavirus, the local testing center established testing criteria on knowledge of inpatient symptoms, including fever, cough and shortness of breath.Citation14 We found that among those referred for outpatient testing, the symptoms that were significantly more frequent among cases than among non-cases differed from those typically reported among inpatients with COVID-19 infection and on which early identification of possible cases were based. In our sample, cough, headache, fever and decrease or loss of taste or smell were the most frequently reported symptoms of COVID-19 cases. These findings were similar to a report from KoreaCitation15 among outpatients who most frequently presented with cough, sputum, reduced sense of smell, and nasal congestion. In our study, there were significant differences in report of symptoms between cases and non-cases, including loss of taste or smell which was reported three times more often in cases than non-cases. Thus, less severe illness seems to be associated with a broader range of symptoms that are not limited to respiratory symptoms.

Many locales have lacked adequate resources to test for SARS-CoV-2. Of importance to such locales, we found that negative responses to two questions could classify about half of tested persons who had a minute risk for SARS-CoV-2 and, if used for screening, would save limited testing resources. These questions are: Was the patient in contact with a COVID-19 case? Has the patient experienced a loss of taste or smell? If a COVID-19 case exposure did not occur, and the patient reported neither loss of taste or smell the patient need not be tested. Our algorithm has a specificity of 69% and, in our low prevalence situation, a 97% negative predictive value.

Within the next year, it is expected that one or more SARS-CoV-2 vaccines will have become available for use in the U.S. At the time of this writing, phase III drug trials are just beginning. It is unknown which vaccine(s) will be licensed, how much vaccine will be available, which population subgroups may be specifically targeted for vaccination, and how much vaccination will cost. It is possible that vaccination will be focused on those who have not been diagnosed with COVID-19. Using the decision tree for symptomatic individuals described herein, those individuals not likely to be infected may be more easily identified for targeted vaccination.

Strengths and limitations

Strengths of this study include the use of a control group of non-cases, development of a testing algorithm using sophisticated recursive partitioning, and the finding of high specificity. Given the mathematical rules underlying testing algorithms, prevalence affects predictive value; thus, other locales may have different predictive values even with the same sensitivity and specificity. The number of positives in our sample was modest, reflecting the low prevalence in our region. The response rate was also modest, reflecting recruitment by e-mail and phone during a time when infection control measures prohibited in-person recruiting and reflecting that non-cases were enrolled primarily by e-mail (i.e., no need to make greater efforts at non-case recruitment because we recruited 755 non-cases with little effort). It was argued early in the U.S. pandemic that testing of those with mild disease may not be an appropriate use of resources due to the moderate sensitivity of the tests,Citation16 however, the RT-PCR test used to identify COVID-19 cases had high sensitivity and specificity,Citation5 meaning that few true cases were likely missed, thus strengthening the conclusions from the CART modeling.

Conclusion

The outpatient symptoms of COVID-19 appear to be broader than the well-known inpatient syndrome and prominently feature loss of taste or smell at a three-fold higher rate (76% vs. 24%) in cases than non-cases. Our algorithm has a modest specificity (69%) but high negative predictive value (97%), allowing testing to be prioritized in those locales in which testing supplies are limited. Those who screen out of testing may be less likely to have been infected with SARS-CoV-2 virus thus may be higher priority for vaccination during the time when vaccine supplies are limited.

Disclosure of potential conflicts of interest

Drs. Zimmerman, Nowalk, and Balasubramani and Ms. Eng and Ms. Sax have research funding from Merck & Co, Inc. for unrelated projects. Dr. Zimmerman has funding from Sanofi for an unrelated project. Drs. Bear and Taber and Mr. and Ms. Clarke have no conflicts to report.

Irb approval

University of Pittsburgh IRB Protocol #19070407

Additional information

Funding

This study was supported through a cooperative agreement with the Centers for Disease Control and Prevention (CDC) through grant number U01 IP000467 and the National Institutes of Health grant number 1UL1 TR001857. The US Flu VE Network is supported through cooperative agreements funded by CDC. The findings and conclusions are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention. It is subject to the CDC’s Open Access Policy.

References

  • Ihle-Hansen H, Berge T, Tveita A, Rønning EJ, Ernø PE, Andersen EL, Wang CH, Tveit A, Myrstad M. COVID-19: symptoms, course of illness and use of clinical scoring systems for the first 42 patients admitted to a Norwegian local hospital. Tidsskr Nor Laegeforen. 2020. doi:10-4045/tidsskr.20.0301.
  • Centers for Disease Control and Prevention, National Center for Immunization and Respiratory Diseases (NCIRD) Division of Viral Diseases. Symptoms of coronavirus. [accessed 2020 May 31].https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html.
  • Chow EJ, Schwartz NG, Tobolowsky FA, Zacks RLT, Huntington-Frazier M, Reddy SC, Rao AK. Symptom screening at illness onset of health care personnel with SARS-CoV-2 infection in King County, Washington. JAMA. 2020;323(20):2087. doi:10.1001/jama.2020.6637.
  • Luers JC, Rokohl AC, Loreck N, Wawer Matos PA, Augustin M, Dewald F, Klein F, Lehmann C, Heindl LM. Olfactory and gustatory dysfunction in coronavirus disease 19 (COVID-19). Clin Infect Dis. 2020. doi:10.1093/cid/ciaa52.
  • Lu X, Wang L, Sakthivel SK, Whitaker B, Murray J, Kamili S, Lynch B, Malapati L, Burke SA, Harcourt J. US CDC real-time reverse transcription pcr panel for detection of severe acute respiratory syndrome coronavirus 2. Emerging Infect Dis. 2020;26(8):8. doi:10.3201/eid2608.201246.
  • Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees (Wadsworth Statistics/Probability). Boca Raton (FL): Chapman & Hall; 1984.
  • Ebell MH, Hansen JG. Proposed clinical decision rules to diagnose acute rhinosinusitis among adults in primary care. Ann Family Med. 2017;15(4):347–54. doi:10.1370/afm.2060.
  • Shi K-Q, Zhou -Y-Y, Yan H-D, Li H, Wu F-L, Xie -Y-Y, Braddock M, Lin X-Y, Zheng M-H. Classification and regression tree analysis of acute‐on‐chronic hepatitis B liver failure: seeing the forest for the trees. J Viral Hepat. 2017;24(2):132–40. doi:10.1111/jvh.12617.
  • Quirke M, Curran EM, O’Kelly P, Moran R, Daly E, Aylward S, McElvaney G, Wakai A. Risk factors for amendment in type, duration and setting of prescribed outpatient parenteral antimicrobial therapy (OPAT) for adult patients with cellulitis: a retrospective cohort study and CART analysis. Postgrad Med J. 2018;94(1107):25–31. doi:10.1136/postgradmedj-2017-134968.
  • Ghiasi MM, Zendehboudi S, Mohsenipour AA. Decision tree-based diagnosis of coronary artery disease: CART model. Comput Methods Programs Biomed. 2020;192:105400. doi:10.1016/j.cmpb.2020.105400.
  • Mofrad RB, Schoonenboom NS, Tijms BM, Scheltens P, Visser PJ, van der Flier WM, Teunissen CE. Decision tree supports the interpretation of CSF biomarkers in Alzheimer’s disease. Alzheimer’s Demen. 2019;11:1–9.
  • Hong W, Dong L, Huang Q, Wu W, Wu J, Wang Y. Prediction of severe acute pancreatitis using classification and regression tree analysis. Dig Dis Sci. 2011;56(12):3664–71. doi:10.1007/s10620-011-1849-x.
  • Zimmerman RK, Balasubramani GK, Nowalk MP, Eng H, Urbanski L, Jackson ML, Jackson LA, McLean HQ, Belongia EA, Monto AS, et al. Classification and Regression Trees (CART) analysis for predicting influenza. Paper presented at: 37th Annual Meeting of the Society for Medical Decision Making; 2015 October 18–21; St. Louis, MO.
  • Pascarella G, Strumia A, Piliego C, Bruno F, Del Buono R, Costa F, Scarlata S, Agrò FE. COVID‐19 diagnosis and management: a comprehensive review. J Intern Med. 2020;288:192–206. doi:10.1111/joim.13091.
  • Kim G-U, Kim M-J, Ra SH, Lee J, Bae S, Jung J, Kim S-H. Clinical characteristics of asymptomatic and symptomatic patients with mild COVID-19. Clin Microbiol Infect. 2020;26(7):948.e1–948.e3. doi:10.1016/j.cmi.2020.04.040.
  • Zitek T. The appropriate use of testing for COVID-19. West J Emergency Med. 2020;21(3):470. doi:10.5811/westjem.2020.4.47370.