Abstract
Mobile phones, due to their audio processing capabilities, have the potential to facilitate the diagnosis of heart disease through automated auscultation. However, such a platform is likely to be used by non-experts, and hence, it is essential that such a device is able to automatically differentiate poor quality from diagnostically useful recordings since non-experts are more likely to make poor-quality recordings. This paper investigates the automated signal quality assessment of heart sound recordings performed using both mobile phone-based and commercial medical-grade electronic stethoscopes. The recordings, each 60 s long, were taken from 151 random adult individuals with varying diagnoses referred to a cardiac clinic and were professionally annotated by five experts. A mean voting procedure was used to compute a final quality label for each recording. Nine signal quality indices were defined and calculated for each recording. A logistic regression model for classifying binary quality was then trained and tested. The inter-rater agreement level for the stethoscope and mobile phone recordings was measured using Conger’s kappa for multiclass sets and found to be 0.24 and 0.54, respectively. One-third of all the mobile phone-recorded phonocardiogram (PCG) signals were found to be of sufficient quality for analysis. The classifier was able to distinguish good- and poor-quality mobile phone recordings with 82.2% accuracy, and those made with the electronic stethoscope with an accuracy of 86.5%. We conclude that our classification approach provides a mechanism for substantially improving auscultation recordings by non-experts. This work is the first systematic evaluation of a PCG signal quality classification algorithm (using a separate test dataset) and assessment of the quality of PCG recordings captured by non-experts, using both a medical-grade digital stethoscope and a mobile phone.
Disclosure statement
The authors report no conflicts of interest. The authors alone are responsible for the content and writing of this article.
Notes
1 A retrospective analysis of the robustness of the classification when varying m and r was performed when using single-feature seSQI classification with regularised logistic regression. It was found that increasing m led to a decrease in classification accuracy for both the iPhone and Littmann devices, while the results varied little (less than 2%) when varying r between 0.0008 and 0.002. Values of r outside of this range led to decreased performance.