2,114
Views
5
CrossRef citations to date
0
Altmetric
Research Article

Analysis of Machine Learning Methods for COVID-19 Detection Using Serum Raman Spectroscopy

ORCID Icon
Pages 1147-1168 | Received 29 Jan 2021, Accepted 27 Aug 2021, Published online: 07 Sep 2021
 

ABSTRACT

One of the most challenging aspects of the emergent coronavirus disease 2019 (COVID-19) pandemic caused by infection of severe acute respiratory syndrome coronavirus 2 has been the need for massive diagnostic tests to detect and track infection rates at the population level. Current tests such as reverse transcription-polymerase chain reaction can be low-throughput and labor intensive. An ultra-fast and accurate mode of detecting COVID-19 infection is crucial for healthcare workers to make informed decisions in fast-paced clinical settings. The high-dimensional, feature-rich components of Raman spectra and validated predictive power for identifying human disease, cancer, as well as bacterial and viral infections pose the potential to train a supervised classification machine learning algorithm on Raman spectra of patient serum samples to detect COVID-19 infection. We developed a novel stacked subsemble classifier model coupled with an iteratively validated and automated feature selection and engineering workflow to predict COVID-19 infection status from Raman spectra of 250 human serum samples, with a 10-fold cross-validated classification accuracy of 98.0% (98.6% precision and 98.5% recall). Furthermore, we benchmarked nine machine learning and artificial neural network models when evaluated using eight standalone performance metrics to assess whether ensemble methods offered any improvement from baseline machine learning models. Using a rank-normalized scores derived from the performance metrics, the stacked subsemble model ranked higher than the Multi-layer Perceptron, which in turn ranked higher than the eight other machine learning models. This study serves as a proof of concept that stacked ensemble machine learning models are a powerful predictive tool for COVID-19 diagnostics.

Highlights

  • Subsemble achieves 98.4% accuracy on Raman spectra of COVID-19 serum samples.

  • Subsemble outperformed nine other machine learning models in several metrics.

  • Forest-based feature selection and wiener filtering improved model performance.

Author Statement

All persons who meet authorship criteria are listed as authors, and all authors certify that they have participated sufficiently in the work to take public responsibility for the content, including participation in the concept, design, analysis, writing, or revision of the manuscript. Furthermore, each author certifies that this material or similar material has not been and will not be submitted to or published in any other publication before its appearance in the journal.

Acknowledgements

I would like to thank Colby Banbury for his mentorship as part of the Erevna Research Fellowship.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Data Availability

The serum Raman spectroscopy data was sourced from Yin et al. (Citation2020) and can be found at 10.6084/m9.figshare.12159924.v1. The computational pipelines for analyzing Raman spectra data, the pre-processing workflow, model performance benchmarks, and saved trained models are available to the public at https://github.com/davidchen0420/Raman_Spectroscopy_COVID_19.

Additional information

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.