Presenting artificial intelligence, deep learning, and machine learning studies to clinicians and healthcare stakeholders: an introductory reference with a guideline and a Clinical AI Research (CAIR) checklist proposal

Jakub Olczaka Institute of Clinical Sciences, Danderyd University Hospital, Karolinska Institute, SwedenCorrespondence[email protected]

John Pavlopoulosb Department of Computer and System Sciences, Stockholm University, Sweden

Jasper Prijsc Flinders University, Adelaide, Australia;d Department of Trauma Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands

Frank F A Ijpmad Department of Trauma Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands;e The Machine Learning Consortium

Job N Doornbergc Flinders University, Adelaide, Australia;d Department of Trauma Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands;e The Machine Learning Consortium

Claes Lundströmf Center for Medical Image Science and Visualization, Linköping University, Sweden

Joel Hedlundf Center for Medical Image Science and Visualization, Linköping University, Sweden

Max Gordona Institute of Clinical Sciences, Danderyd University Hospital, Karolinska Institute, Sweden

show all

Pages 513-525 | Published online: 14 May 2021

Cite this article
https://doi.org/10.1080/17453674.2021.1918389
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

Adamson A S, Smith A. Machine learning and health care disparities in dermatology. JAMA Dermatol 2018; 154(11): 1247.
PubMed Web of Science ®Google Scholar
Anderson P, Fernando B, Johnson M, Gould S. SPICE: Semantic Propositional Image Caption Evaluation. arXiv:160708822 [cs] [Internet] 2016 Jul 29 [cited 2020 Nov 30]. Available from: http://arxiv.org/abs/1607.08822
Google Scholar
Badgeley M A, Zech J R, Oakden-Rayner L, Glicksberg B S, Liu M, Gale W, et al. Deep learning predicts hip fracture using confounding patient and healthcare variables. NPJ Digit Med 2019; 2(1): 1–10.
PubMedGoogle Scholar
Bandos A I, Obuchowski N A. Evaluation of diagnostic accuracy in free-response detection-localization tasks using ROC tools. Stat Methods Med Res 2019; 28(6): 1808–25.
PubMed Web of Science ®Google Scholar
Banerjee S, Lavie A. METEOR: an automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization [Internet]. Ann Arbor, MI: Association for Computational Linguistics; 2005 [cited 2020 Nov 30]. p. 65–72. Available from: https://www.aclweb.org/anthology/W05-0909
Google Scholar
Beil M, Proft I, van Heerden D, Sviri S, van Heerden P V. Ethical considerations about artificial intelligence for prognostication in intensive care. Intensive Care Med Exp 2019; 7(1): 70.
PubMed Web of Science ®Google Scholar
Chakraborty D P. A brief history of free-response receiver operating characteristic paradigm data analysis. Academic Radiology 2013; 20(7): 915–9.
PubMed Web of Science ®Google Scholar
Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 2020; 21(1): 6.
PubMed Web of Science ®Google Scholar
Esteva A, Kuprel B, Novoa R A, Ko J, Swetter S M, Blau H M, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017; 542(7639): 115.
PubMed Web of Science ®Google Scholar
Felländer-Tsai L. AI ethics, accountability, and sustainability: revisiting the Hippocratic oath. Acta Orthop 2020; 91(1): 1–2.
PubMed Web of Science ®Google Scholar
Fleming N. How artificial intelligence is changing drug discovery. Nature 2018; 557(7707): S55–S7.
PubMed Web of Science ®Google Scholar
Gale W, Oakden-Rayner L, Carneiro G, Bradley A P, Palmer L J. Producing radiologist-quality reports for interpretable artificial intelligence. arXiv:180600340 [cs] [Internet] 2018 Jun 1 [cited 2020 Nov 30]. Available from: http://arxiv.org/abs/1806.00340
Google Scholar
Gulshan V, Peng L, Coram M, Stumpe M C, Wu D, Narayanaswamy A, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016; 316(22): 2402.
PubMed Web of Science ®Google Scholar
Gulshan V, Rajan RP, Widner K, Wu D, Wubbels P, et al. Performance of a deep-learning algorithm vs manual grading for detecting diabetic retinopathy in India. JAMA Opthalmol 2019; 137(9): 987–93.
PubMed Web of Science ®Google Scholar
Hedlund J, Eklund A, Lundström C. Key insights in the AIDA community policy on sharing of clinical imaging data for research in Sweden. Scientific Data 2020; 7(1): 331.
PubMed Web of Science ®Google Scholar
Kamulegeya L H, Okello M, Bwanika J M, Musinguzi D, Lubega W, Rusoke D, et al. Using artificial intelligence on dermatology conditions in Uganda: a case for diversity in training data sets for machine learning. bioRxiv 2019 Oct 31; 826057.
Google Scholar
Kougia V, Pavlopoulos J, Androutsopoulos I. A survey on biomedical image captioning. arXiv:190513302 [cs] [Internet] 2019 May 26 [cited 2020 Dec 1]. Available from: http://arxiv.org/abs/1905.13302
Google Scholar
Lin C-Y. ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out [Internet]. Barcelona, Spain: Association for Computational Linguistics; 2004 [cited 2020 Nov 30]. p. 74–81. Available from: https://www.aclweb.org/anthology/W04-1013
Google Scholar
Liu X, Cruz Rivera S, Moher D, Calvert M J, Denniston A K. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med 2020a; 26(9): 1364–74.
PubMed Web of Science ®Google Scholar
Liu X, Rivera S C, Faes L, Keane P A, Moher D, Calvert M, et al. CONSORT-AI and SPIRIT-AI: new reporting guidelines for clinical trials and trial protocols for artificial intelligence interventions. Invest Ophthalmol Vis Sci 2020b; 61(7): 1617–1617.
Web of Science ®Google Scholar
Lupton M. Some ethical and legal consequences of the application of artificial intelligence in the field of medicine. Trends Med [Internet] 2018 [cited 2020 Oct 17]; 18(4). Available from: https://www.oatext.com/some-ethical-and-legal-consequences-of-the-application-of-artificial-intelligence-in-the-field-of-medicine.php
Google Scholar
Manning C. Artificial intelligence definitions [Internet] 2020 [cited 2020 Nov 26]. Available from: https://hai.stanford.edu/sites/default/files/2020-09/AI-Definitions-HAI.pdf
Google Scholar
Michie D, Spiegelhalter D J, Taylor C C. Machine learning, neural and statistical classification 1994 [cited 2016 Dec 7]. Available from: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.355
Google Scholar
Mittelstadt B D, Allo P, Taddeo M, Wachter S, Floridi L. The ethics of algorithms: mapping the debate. Big Data & Society 2016; 3(2): 2053951716679679.
Web of Science ®Google Scholar
Nicolaes J, Raeymaeckers S, Robben D, Wilms G, Vandermeulen D, Libanati C, et al. Detection of vertebral fractures in CT using 3D convolutional neural networks. arXiv:191101816 [cs, eess] [Internet] 2019 Nov 5 [cited 2020 Oct 9]. Available from: http://arxiv.org/abs/1911.01816
Google Scholar
Obuchowski N A, Lieber M L, Powell K A. Data analysis for detection and localization of multiple abnormalities with application to mammography. Acad Radiol 2000; 7(7): 516–25.
PubMed Web of Science ®Google Scholar
Olczak J, Emilson F, Razavian A, Antonsson T, Stark A, Gordon M. Ankle fracture classification using deep learning: automating detailed AO Foundation/Orthopedic Trauma Association (AO/OTA) 2018 malleolar fracture identification reaches a high degree of correct classification. Acta Orthop 2021; 92(1): 102–8.
PubMed Web of Science ®Google Scholar
Pandis N, Fedorowicz Z. The International EQUATOR Network: enhancing the quality and transparency of health care research. J Appl Oral Sci 2011; 19(5): 0.
PubMed Web of Science ®Google Scholar
Papineni K, Roukos S, Ward T, Zhu W-J. BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics [Internet]. Stroudsburg, PA: Association for Computational Linguistics; 2002 [cited 2016 Nov 3]. p. 311-318. (ACL ’02). Available from: http://dx.doi.org/10.3115/1073083.1073135
Google Scholar
Paul D, Sanap G, Shenoy S, Kalyane D, Kalia K, Tekade R K. Artificial intelligence in drug discovery and development. Drug Discovery Today [Internet] 2020 Oct 21 [cited 2020 Nov 26]. Available from: http://www.sciencedirect.com/science/article/pii/S1359644620304256
Web of Science ®Google Scholar
Pocevičiūtė M, Eilertsen G, Lundström C. Survey of XAI in digital pathology. In: Holzinger A, Goebel R, Mengel M, Müller H, editors. Artificial intelligence and machine learning for digital pathology. Lecture Notes in Computer Science, vol. 12090, 2020.
Google Scholar
Qi Y, Zhao J, Shi Y, Zuo G, Zhang H, Long Y, et al. Ground truth annotated femoral X-ray image dataset and object detection based method for fracture types classification. IEEE Access 2020; 8: 189436–44.
Web of Science ®Google Scholar
Rivera S C, Liu X, Chan A-W, Denniston A K, Calvert M J, Ashrafian H, et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Lancet Digital Health 2020; 2(10): e549-60.
Google Scholar
Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 2019; 1(5): 206–15.
PubMedGoogle Scholar
Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 2015; 10(3): e0118432.
PubMed Web of Science ®Google Scholar
Sendak M P, Gao M, Brajer N, et al. Presenting machine learning model information to clinical end users with model facts labels. NPJ Digit Med 2020; 3(41). https://doi.org/https://doi.org/10.1038/s41746-020-0253-3
Google Scholar
Shim E, Kim J Y, Yoon J P, Ki S-Y, Lho T, Kim Y, et al. Automated rotator cuff tear classification using 3D convolutional neural network. Scientific Reports 2020; 10(1): 15632.
PubMed Web of Science ®Google Scholar
Vedantam R, Zitnick C L, Parikh D. CIDEr: fonsensus-based image description evaluation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015. p. 4566–75.
Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Presenting artificial intelligence, deep learning, and machine learning studies to clinicians and healthcare stakeholders: an introductory reference with a guideline and a Clinical AI Research (CAIR) checklist proposal

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Presenting artificial intelligence, deep learning, and machine learning studies to clinicians and healthcare stakeholders: an introductory reference with a guideline and a Clinical AI Research (CAIR) checklist proposal

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date