1,093
Views
0
CrossRef citations to date
0
Altmetric
Infectious Diseases

Development and validation of HBV surveillance models using big data and machine learning

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all
Article: 2314237 | Received 25 Sep 2023, Accepted 30 Jan 2024, Published online: 10 Feb 2024
 

Abstract

Background

The construction of a robust healthcare information system is fundamental to enhancing countries’ capabilities in the surveillance and control of hepatitis B virus (HBV). Making use of China’s rapidly expanding primary healthcare system, this innovative approach using big data and machine learning (ML) could help towards the World Health Organization’s (WHO) HBV infection elimination goals of reaching 90% diagnosis and treatment rates by 2030. We aimed to develop and validate HBV detection models using routine clinical data to improve the detection of HBV and support the development of effective interventions to mitigate the impact of this disease in China.

Methods

Relevant data records extracted from the Family Medicine Clinic of the University of Hong Kong-Shenzhen Hospital’s Hospital Information System were structuralized using state-of-the-art Natural Language Processing techniques. Several ML models have been used to develop HBV risk assessment models. The performance of the ML model was then interpreted using the Shapley value (SHAP) and validated using cohort data randomly divided at a ratio of 2:1 using a five-fold cross-validation framework.

Results

The patterns of physical complaints of patients with and without HBV infection were identified by processing 158,988 clinic attendance records. After removing cases without any clinical parameters from the derivation sample (n = 105,992), 27,392 cases were analysed using six modelling methods. A simplified model for HBV using patients’ physical complaints and parameters was developed with good discrimination (AUC = 0.78) and calibration (goodness of fit test p-value >0.05).

Conclusions

Suspected case detection models of HBV, showing potential for clinical deployment, have been developed to improve HBV surveillance in primary care setting in China. (Word count: 264)

KEY MESSAGES

  • This study has developed a suspected case detection model for HBV, which can facilitate early identification and treatment of HBV in the primary care setting in China, contributing towards the achievement of WHO’s elimination goals of HBV infections.

  • We utilized the state-of-art natural language processing techniques to structure the data records, leading to the development of a robust healthcare information system which enhances the surveillance and control of HBV in China.

Acknowledgements

We would like to express our sincere gratitude to the University of Hong Kong-Shenzhen Hospital for their invaluable support. This enabled us to develop and validate innovative HBV surveillance models using big data and machine learning techniques. Without this support, this study would not have been possible. We extend our heartfelt thanks to the hospital and all those involved in making this collaboration successful.

Author contributions

WD was the main author who designed and implemented the methods, analyzed the data, and developed the HBV surveillance model. WCWW provided supervision and advice throughout the study. CCDR prepared the first draft of the manuscript. DC, DZ, and YX reviewed and modified the manuscript. WKS reviewed and verified the underlying data, and provided professional comments to the modification of the manuscript. All authors had full access to all the data in the study, read and approved the final version of the manuscript, and had final responsibility for the decision to submit for publication.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

Data of this study are available from the corresponding author upon reasonable request.