22,298
Views
94
CrossRef citations to date
0
Altmetric
Review

Artificial Intelligence (AI) for the early detection of breast cancer: a scoping review to assess AI’s potential in breast screening practice

, , &
Pages 351-362 | Received 04 Feb 2019, Accepted 18 Apr 2019, Published online: 03 May 2019

ABSTRACT

Introduction: Various factors are driving interest in the application of artificial intelligence (AI) for breast cancer (BC) detection, but it is unclear whether the evidence warrants large-scale use in population-based screening.

Areas covered: We performed a scoping review, a structured evidence synthesis describing a broad research field, to summarize knowledge on AI evaluated for BC detection and to assess AI’s readiness for adoption in BC screening. Studies were predominantly small retrospective studies based on highly selected image datasets that contained a high proportion of cancers (median BC proportion in datasets 26.5%), and used heterogeneous techniques to develop AI models; the range of estimated AUC (area under ROC curve) for AI models was 69.2–97.8% (median AUC 88.2%). We identified various methodologic limitations including use of non-representative imaging data for model training, limited validation in external datasets, potential bias in training data, and few comparative data for AI versus radiologists’ interpretation of mammography screening.

Expert opinion: Although contemporary AI models have reported generally good accuracy for BC detection, methodological concerns, and evidence gaps exist that limit translation into clinical BC screening settings. These should be addressed in parallel to advancing AI techniques to render AI transferable to large-scale population-based screening.

1. Introduction and aims

Intelligent computer systems have existed and have made a mark in society for several decades. Interest in artificial intelligence (AI) research and development spans the technology, communication, industry, health, and government including security and defense sectors [Citation1]. At present, the convergence of novel AI techniques, massive computer processing capabilities, and widespread growth of digital capture and storage of data in general and specifically in science and health, is transforming the application of AI in diverse areas. In cancer, as in other areas of healthcare, AI systems are being developed, explored and evaluated for disease detection, prognostication and as support strategies for clinical decision-making.

In the context of breast cancer, ongoing research using AI for early detection includes a global effort attempting to develop advanced machine learning algorithms for interpreting screening mammograms to potentially improve breast cancer screening by reducing false-positives [Citation2,Citation3]. The potential application of AI in breast cancer diagnostics extends to imaging modalities and also pathology interpretation, for example, AI has been shown to augment identification of metastatic breast cancer in whole-slide images of sentinel lymph node biopsy [Citation4]. We focus on early detection of breast cancer in this work to gauge the potential role of contemporary AI systems in screening practice.

We performed a scoping review, a form of structured evidence synthesis (similar to a systematic review) describing a broad research field, with the aims of (a) identifying and summarizing current knowledge on the application of AI in the early detection of (screening for) breast cancer; (b) mapping key evidence concepts in the application of AI in breast screening, specifically whether AI has been evaluated as a stand-alone screening strategy or as a complement (aid) to screen-reading (that is, an aid to human interpretation of mammograms) to determine transferability to the screening context; and (c) defining gaps in the available evidence to highlight areas meriting more research. The scoping review did not aim to assess technical or statistical aspects in the development of AI models and strategies, rather it focused on the evidence in applied research using AI techniques to determine readiness for real-world breast screening practice or screening trials, and to inform future research in the AI space as it relates to breast cancer screening.

2. Methods

We performed a scoping review to assess and summarize, in a structured manner, the evidence on the use of AI in breast cancer detection. We anticipated a range of study designs exploring various AI methods in different applied contexts in breast cancer detection, we therefore undertook a scoping review to address this broad research area – given the heterogeneity of research in this field, conventional data synthesis using standard systematic reviews or meta-analysis would not be appropriate [Citation5]. Scoping reviews allow evidence mapping and synthesis from a variety of studies and sources to address broad research questions and to identify evidence gaps [Citation5,Citation6]. To develop the methods of the scoping review, we considered a framework and recommendations on scoping review methodology [Citation5Citation7] as well as a reporting checklist (an extension of PRISMA) specific to scoping reviews (PRISMA ScR) [Citation8].

2.1. Literature search and eligible studies

A literature search was conducted (2010–2018) as shown in Appendices 12; the search timeframe was chosen to factor advances in AI methods and capabilities. The review focused on summarizing the evidence on the application of AI in breast cancer detection (screening) without study design restriction. Studies were eligible for inclusion in our review on the basis of the following criteria: (a) the purpose of the study was to assess an AI approach or strategy in breast cancer screening or detection; (b) reported quantitative data on performance (accuracy) or screening or clinical outcome measures for the AI approach relative to a reference standard and/or an established comparator (for example, an ascertained database or radiologists’ interpretation); (c) undertook the evaluation in women or screening examinations from women without being restricted solely to women with breast cancer or to those who have had tissue biopsy. Studies were not eligible: if they evaluated AI in phantom (simulated) lesions or in simulation models, or if they described AI techniques or compared data-mining algorithms (or dealt with the development thereof) without application as described in the inclusion criteria; if they did not provide information on the number of subjects or screens or images included, or if they were based on fewer than 100 subjects (or fewer than 200 images if multiple images were used from an undeclared number of subjects) as this would not yield reliable information for testing of AI strategies in the context of breast screening. Commentary or editorial articles, review articles, and congress abstracts were not eligible for inclusion.

Literature searching and abstract screening to identify potentially eligible studies were performed by one investigator (NH): selection of eligible studies based on the above-defined criteria is shown in the flow-diagram (Appendix 1).

2.2. Data extraction and collation

Study-specific information and data were extracted into an evidence table to summarize the following: purpose or aims of the study, design and methods (including amount and type of data), source population or subjects (and whether study included consecutive screens or subjects), reference standard and/or comparator (if any), class of AI technique, validation (if done), and the main findings reported on accuracy, or screening or clinical outcomes (where reported). Formal quality appraisal is not routinely done in scoping reviews; however, methodological variables were considered in the extracted information to provide an understanding of the quality of the evidence. Extraction of information from eligible studies was based on independent double-extraction (two investigators from NN, GKJ, and NH) using a pre-defined extraction form; discussion and consensus were used to cross-check the extracted information and to resolve disagreement.

The information collated in the evidence tables was used to define the main themes of research and to elucidate the extent that published evidence to date transfers to breast cancer screening. Descriptive statistics (median, range) were used to summarize quantitative information where these were reported by a majority of the eligible studies; where studies reported several estimates for accuracy measures, the median of the reported range was used for that study.

3. Results

There were 23 eligible studies [Citation9Citation31] based on the literature search strategy (additional details and excluded studies [Citation32Citation47] shown in Appendix 1). A summary of the eligible studies, including study characteristics, is shown in ; and study findings are reported in . There were no prospective screening trials or randomized studies. Studies were predominantly retrospective using publicly available or institutional image datasets (), and the same datasets (or selected subsets thereof) were often used in several studies; however, Parmeggiani et al. [Citation28] reported a prospective cohort study from an institutional screening program, and Ayer et al. [Citation31] reported a retrospective study using a large prospectively collected well-defined mammographic database. As shown in , studies were generally based on relatively modest numbers of images (and hence smaller number of subjects), except for each of the studies from Kooi et al. and Ayer et al. [Citation18,Citation31] which investigated AI systems using relatively large datasets (>40,000 images, or mammographic examinations from >9,000 women). Most studies provided limited information on the methods used to assemble the source imaging datasets and the extent that these were verified in terms of a reference standard, with many studies simply citing the source image dataset [Citation10,Citation12,Citation13,Citation19,Citation20,Citation22Citation24,Citation26,Citation27,Citation30]. However, several studies described an appropriate reference standard that included histopathology with either clinical follow-up or cancer registry matching to ascertain outcomes [Citation9,Citation14,Citation16,Citation18,Citation21,Citation28,Citation31].

Table 1. Summary of the characteristics of studies reporting on artificial intelligence (AI) in breast cancer detection.

Table 2. Summary of the findings of studies reporting on artificial intelligence (AI) in breast cancer detection.

Studies proposed to develop and/or evaluate AI models or techniques for breast cancer detection [Citation9,Citation11,Citation18,Citation21,Citation22,Citation27,Citation28,Citation26], or for diagnosis (classification) or interpretation of mammographic examinations [Citation13,Citation14,Citation15,Citation16,Citation20,Citation23Citation25,Citation30], or dealt with advancing computer-aided detection (CAD) systems through new AI models [Citation10,Citation12,Citation17,Citation19,Citation29]; and one study investigated AI for discrimination between benign and cancerous lesions jointly with cancer risk prediction [Citation31]. Rodriguez-Ruiz et al. [Citation9] reported a multi-reader study comparing an AI system with radiologists’ interpretation of various datasets of screening and clinical mammographic examinations. All studies were based on mammographic images except for the studies from Becker (which used ultrasound scans) [Citation16] and from Parmegianni (which combined ultrasound and mammography screening) [Citation28].

The reported breast cancer proportion across studies ranged between 0.80% and 55.0% for studies reporting this variable, with a median cancer proportion of 26.5% [Citation9,Citation10,Citation12,Citation14,Citation16,Citation18,Citation19,Citation21Citation31]. With the exception of one study from Ayer et al. [Citation31], studies did not include consecutive screens or subjects (with many reporting selection of cases with abnormalities), or did not report any information on whether consecutive screens were included or the extent of exclusions. This is commensurate with the generally high proportion of cancers described for the datasets used to develop AI models across studies () with the exception of the work from Ayer et al. [Citation31].

A brief summary of the AI methods (type of AI and validation) and study-specific results are shown in . The AI techniques were heterogeneous but there was a predominance of models that were primarily developed using convolutional neural networks (CNN), and AI models generally achieved good accuracy (). Most of the studies incorporated a validation process (frequently cross-validation) when training AI models or reported results of model testing, generally using subsets of images that were not used for training or by augmenting (modifying) the image datasets to allow testing of the developed model. However, few studies undertook an external validation of the developed AI model using an independent dataset (study-specific details shown in ).

We did not identify any studies that reported clinical outcomes or conventional breast cancer screening metrics (such as cancer detection rates or recall rates). The most consistently reported measure of accuracy for the AI models () was the area under the receiver-operating characteristic (ROC) curve, a global measure of accuracy that incorporates the trade-off between sensitivity and specificity: the AUC across studies ranged between 69.2% and 97.8% [Citation9Citation12,Citation14Citation16,Citation18Citation27,Citation30,Citation31], with a median AUC of 88.2%. Several studies reported a range of estimates depending on the techniques used within the study or on whether the training or validation data (or both) were reported (). Other study-specific results (accuracy, sensitivity, specificity) that were not consistently reported by most studies are also shown in . Very few studies reported comparisons for AI and human readers: the five studies that did so [Citation9,Citation14,Citation16,Citation18,Citation31] showed mixed findings for AUC, sensitivity, and specificity – study-specific results are shown in .

4. Discussion

We report a scoping review, a form of structured evidence collation used to address a broad research question, to assess the evidence on AI systems evaluated for breast cancer detection (published since 2010) to gauge AI’s potential role in breast screening. We specifically looked for evidence on how the AI models performed and whether there was data comparing their accuracy to human readers in a breast screening context. Available studies indicate a potential role for AI in this clinical scenario, however, there are evidence gaps relevant to future evaluation and application of AI in breast cancer screening.

We found that the published evidence on AI for breast cancer detection was concentrated around model (algorithmic) development, generally independently of real-world clinical or screening evaluation, and overall the evidence does not indicate the readiness of AI systems for real-world breast screening trials or for stand-alone screen-reading. We arrived at that conclusion despite encouraging results for the performance of the AI models, highlighted in the range of reported model AUCs (69.2% to 97.8% across studies, median AUC 88.2%), because there are key evidence gaps that need to be addressed before AI can be rendered more transferable to large-scale screening evaluations. Our conclusion takes into consideration both the rationale for undertaking the scoping review and the methodological concerns we have identified through our work (described in the remainder of the Discussion) that are relevant for future studies in this field.

Several factors prompted us to undertake this work: first, there are large-scale projects developing AI for breast cancer screening [Citation3]; second, the data and statistical sciences driving AI development have advanced substantially in recent years, as has digital imaging data capture and archiving; third, mammography, the only imaging modality to date shown to reduce breast cancer mortality, has evolved into digital breast tomosynthesis (DBT or 3D mammography) technology which contains richer imaging data than conventional mammography; and fourth, the increasing burden of resourcing screen-reading in population-based screening programs that practice double-reading of mammography. In combination, these factors steer a rationale for AI as a candidate technology for future breast screening practice. Hence, we sought to assess the published literature to gauge the readiness of recently investigated AI systems for breast screening application and to inform research directions in this field. We identified several concerns relating to the quality, depth, and representativeness of imaging datasets used to train models, as well as limited comparative data (AI versus human readers), that affect both the applicability and robustness of developed models and raise the possibility of bias. These issues merit attention from researchers developing and evaluating AI models and systems with the intention of deploying these in breast cancer screening practice. We also note that only one of the studies included in this scoping review evaluated a commercially available AI system in a reader study format [Citation9], and there are no prospective evaluations reported in clinical practice settings. This suggests that real-world implementation studies of AI in breast screening may be lagging behind developments in the AI industry or may not be available yet in the peer-reviewed literature.

First, the majority of studies used relatively small datasets, frequently using the same or selected subsets of the same source datasets to train models; and many of the eligible studies provided limited information on the methods used to verify the source datasets in terms of a reference standard. Most of these imaging datasets were enriched with malignant lesions, with studies often selecting images containing suspicious abnormalities. This is reflected in the high percentage of breast cancer in the datasets used to train AI algorithms in the majority of studies (median 26.5%). Whereas this approach supports the feasibility of conducting the ‘experiment’ and developing an AI model, resulting model performance has unclear (uncertain) applicability to a real-world screening where only around 0.5%–0.8% of screens will contain cancer. Only one study from Ayer et al. [Citation31] had a cancer prevalence approximating that encountered in screening practice, and that study differed from the other studies because it focused on the combination of classification (of mammography findings) and risk prediction. The use of small cancer-enriched datasets presents a methodologic concern that raises substantial uncertainty regarding the quality of the imaging data used to train AI models in terms of limited applicability (external validity) beyond the reported experiment. The over-representation of malignant lesions in the imaging samples would be expected to affect reported measures of AI model performance potentially over-estimating accuracy. Second, the majority of studies did not undertake validation of the developed AI model using an independent external data-set (and the few that did so used small selected datasets), raising more uncertainty regarding transferability of the model’s performance to breast cancer screening.

We found that studies were mostly focused on describing, refining, enhancing, and diversifying the AI techniques and algorithms, with little attention given to whether (or how) the imaging data sets used to train and test the AI models were representative of images encountered routinely in the breast screening context, and whether AI models were capable of recognizing the common ‘normality’ inherent in the screening scenario. AI algorithms may perform differently in different patient populations given heterogeneity of breast cancer risk factors and potentially imaging features between populations. This limitation suggests that larger validation datasets, preferably in diverse screening environments and population, are required in order for promising AI algorithms to progress to the next step of clinical development. As evidenced by the high proportion of cancerous images in the data sources used thus far, the imaging data may not be representative of the real-world screening setting and may additionally be biased due to deviation from the spectrum of findings usually seen in breast screening. Bias in datasets used to train AI algorithms is likely to lead to similar bias when applied in screening practice or may lead to non-robust models not due to poor algorithmic science but due to unbalanced imaging datasets. This problem may be magnified by the small sample sizes of imaging datasets in most of the studies, with the exception of two studies that trained AI models using larger datasets [Citation18,Citation31].

Third, there were limited data on AI versus human interpretation of breast screening examinations. Only five studies reported comparative estimates of accuracy for AI and radiologists [Citation9,Citation14,Citation16,Citation18,Citation31], and those studies generally showed that the AI models achieved accuracy measures that approximate those of radiologists (). One of the largest studies based on imaging sets from a Dutch screening program, from Kooi et al. [Citation18], showed a high AUC for the AI model (AUC 92.9%); however, this estimate was significantly lower at the testing phase (AUC 85.2%) than the mean AUC for radiologists (AUC 91.1%). Future studies should compare AI algorithms to radiologists’ performance in unselected screening examinations, or report the incremental improvement for AI algorithms in combination with radiologist interpretive performance. It may be that AI algorithms are detecting different findings than human interpreters, and vice versa, but this cannot be determined from the currently available studies. We also searched the eligible studies for clinical outcome measures or conventional breast screening metrics (such as cancer detection rates or recall rates) but did not identify any data on these outcomes, and none of the studies attempted to canvass women’s or societal perspectives on the acceptability of AI. We also noted that none of the abstracts retrieved in the literature search addressed the latter issues. It is likely that research into women’s or societal perspectives is beyond the scope of studies evaluating AI models for breast cancer detection, however, these issues merit consideration in future AI research.

Finally, all the currently published studies meeting our inclusion criteria developed AI models using data from screen-film or digital mammography. However, as DBT is progressively becoming the breast screening modality of choice, future AI studies should include imaging data from DBT screening. AI algorithms that are only developed and validated using conventional (2D) mammography data may be outdated by the time of clinical adoption, as more than half of screening facilities in the USA now have DBT capability [Citation48]. Moreover, DBT represents volumetric data from multiple summed 2D imaging slices, with the prospect of providing a much larger amount of quantitative imaging data that could further improve AI algorithm performance. Therefore, future testing and validation imaging sets should include DBT screening examinations linked to radiologist performance and cancer outcomes data.

There are limitations to our scoping review; we focused on published studies from 2010 onwards to factor in advances in AI capabilities, such as deep learning, therefore we did not review older studies that paved the way for more recent AI studies. We did not attempt to detail the AI techniques or computational methods reported in the eligible studies beyond the basic details shown in , we recognize the heterogeneity of AI systems and that this impacts model performance but this was beyond the scope of our review. We were aiming to gauge the readiness of AI for screening application, rather than describing the highly detailed techniques of AI models. Finally, as for any structured review, we had pre-specified inclusion criteria, hence some studies (such as those restricted solely to cases who had biopsy) were not eligible for inclusion in this scoping review.

4.1. Conclusions

Our scoping review of studies of AI for breast cancer detection showed predominantly retrospective studies based on relatively small and highly selected image datasets and has identified methodologic limitations that detract from the applicability of AI systems in the breast screening setting. Although the reviewed studies used novel techniques and reported encouraging results for AI model accuracy, the methodologic issues highlighted in our work (such as use of imaging data that may not represent the screening setting, the potential for bias in model training, and the lack of comparative data) can inform future studies and improve the translation of AI systems into breast cancer screening practice.

4.2. Expert opinion

We foresee that several factors, in combination, are driving a growing interest in the development of AI approaches for routine breast cancer screening. These factors include advances in AI sciences, including increased computing power and cloud storage of large amounts of imaging data, as well as a genuine need to improve breast screening outcomes, such as reducing false-positive mammography screening results. Moreover, AI approaches that can help decrease human workload would improve screening practices in resource-limited screening settings or in population breast screening programs that currently rely on double-reading.

Beyond improved techniques for training and validating dedicated AI models for mammography screening, large prospective studies will be needed to evaluate developed AI models using a mix of screening examinations that represent real-world screening scenarios (in terms of a spectrum of positive and negative imaging findings, and cancer prevalence in populations). Ideally, these should be validated using independent large screening datasets from diverse populations, with input from imaging experts and those working in the screening environment, to ensure relevance and timely translation. Currently, these data exist in closed, national screening programs with complete cancer capture. Ideally, these datasets linked to ground truth could be used to validate the many commercial AI algorithms that are potentially likely to gain approval for direct consumer marketing over the next five years.

We believe that well-designed studies should be developed to compare AI algorithms to radiologists’ performance or to estimate the incremental improvement (or change) in accuracy when AI algorithms are combined with radiologists’ interpretations or substituted for one of two screen-readers. These studies should factor in the unexplored interaction between the AI algorithm output and the radiologists’ use of this additional information to arrive at an ultimate recommendation. The incremental improvement of AI in combination with human interpretation will be critical to organized screening programs that use double-reading, as an effective AI system could be a solution to radiologist shortages by creating a single-reader model with AI support.

We also anticipate that future studies will soon develop and test models for the interpretation of DBT to improve detection metrics and to ensure relevance for future population breast cancer screening practice. As DBT becomes a screening modality of choice in some programs, AI algorithms will have to adapt to the new imaging modality. However, in contrast, some screening programs may have just recently adopted digital mammography and changing hardware to DBT systems may be cost-prohibitive. Addition of a cost-effective AI algorithm in combination with DM may demonstrate an incremental improvement to screening accuracy that could approximate DBT performance and be a more cost-effective solution for these programs.

Finally, we expect that future research in AI development and evaluation will progress in parallel with qualitative research that addresses the major knowledge gaps around the acceptability of using AI in breast cancer screening services, and the many ethical, social, and legal implications of their use in healthcare. In addition, from a big picture perspective, if AI is adopted in breast screening practice, the benefits and harms trade-off inherent in population breast cancer screening will need to be reassessed to factor in the incremental benefits and harms including the unintended consequences from using AI in lieu of human image interpretation.

Article highlights

  • This scoping review, a form of structured evidence synthesis describing a broad research field, summarizes knowledge from 23 studies that evaluated artificial intelligence (AI) for automated BC detection

  • Majority of studies were small, retrospective studies that trained and tested AI models using cancer-enriched image datasets (median proportion cancer-positive 26.5%)

  • AI techniques were heterogeneous, but a predominance of models was developed using convolutional neural networks (CNN); most studies validated developed models (frequently using cross-validation) but few tested the model in an independent dataset

  • A consistently reported measure of accuracy for the AI models was the area under the receiver-operating characteristic curve (AUC): estimated AUC was 69.2–97.8% across studies

  • Methodological concerns include substantial uncertainty regarding the quality of the imaging data used to train AI models in terms of limited applicability (external validity) of developed models

  • There was potential for bias due to use of unbalanced imaging data (that does not represent the spectrum in real-world screening) to train and test models; hence, algorithms may not perform well when applied or tested in actual screening practice

  • There were limited comparative data on AI versus human interpretation of breast screening examinations

  • Current evidence is limited to AI algorithm development in digital (2D) mammography; none of the studies used digital breast tomosynthesis (3D-mammography) to train or test models

  • We identify current gaps in the evidence including the need for large prospective studies that develop and test AI using real-world screening data and more efforts in the clinical translation of AI systems into routine breast cancer screening practice.

Declaration of interest

CI Lee declares research grant funding (to institution) from GE Healthcare unrelated to the work in this manuscript. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Reviewer disclosures

Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.

Additional information

Funding

This work was supported by a National Breast Cancer Foundation (NBCF Australia) Breast Cancer Research Leadership Fellowship to N Houssami; CI Lee was supported in part by the Safeway Foundation; G Kirkpatrick-Jones was supported by a Sydney Summer Scholarship from the Faculty of Medicine & Health, University of Sydney.

References

  • The national artificial intelligence research and development strategic plan. In: Council NSaT, editor. United States: networking and Information Technology Research and Development Subcommittee; US: National Science and Technology Council; 2016. p. 1–40.
  • Houssami N, Lee CI, Buist DSM, et al. Artificial intelligence for breast cancer screening: opportunity or hype? Breast. 2017;36:31–33.
  • Trister AD, Buist D, Lee CI. Will machine learning tip the balance in breast cancer screening? JAMA Oncol. 2017;3:1463.
  • Wang D, Khosla A, Gargeya R, et al. Deep learning for identifying metastatic breast cancer. Beth Israel Deaconess Medical Center, Harvard Medical School; 2016. p. 1–6.
  • Peters MD, Godfrey C, Khalil H, et al. Guidance for conducting systematic scoping reviews. Int J Evid Based Healthc. 2015 Sep;13(3):141–146.
  • Colquhoun HL, Levac D, O‘Brien KK, et al. Scoping reviews: time for clarity in definition, methods, and reporting. J Clin Epidemiol. 2014;67(12):1.
  • Arksey H, O’Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol. 2005;8(1):19–32.
  • Tricco AC MSc, Lillie E MSc, Zarin W MPH, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018 Sep 4;169(7):467–473.
  • Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Stand-alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radiologists. J Natl Cancer Inst. 2019;111(9):djy222.
  • Al-Masni MA, Al-Antari MA, Park J-M, et al. Simultaneous detection and classification of breast masses in digital mammograms via a deep learning YOLO-based CAD system. Comput Methods Programs Biomed. 2018;157:85–94.
  • Ribli D, Horváth A, Unger Z, et al. Detecting and classifying lesions in mammograms with deep learning. Sci Rep. 2018;8(1):4165.
  • Chougrad H, Zouaki H, Alheyane O. Deep convolutional neural networks for breast cancer screening. Comput Methods Programs Biomed. 2018 Apr 01;157: 19–30.
  • Bandeira Diniz JO, Bandeira Diniz PH, Azevedo Valente TL, et al. Detection of mass regions in mammograms by bilateral analysis adapted to breast density using similarity indexes and convolutional neural networks. Comput Methods Programs Biomed. 2018;156:191–207.
  • Becker AS, Mueller M, Stoffel E, et al. Classification of breast cancer in ultrasound imaging using a generic deep learning analysis software: a pilot study. Br J Radiol. 2018;91(1083):20170576. PubMed PMID: 29215311.
  • Lotter W, Sorensen G, Cox D. A multi-scale CNN and curriculum learning strategy for mammogram classification. In: Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer; 2017. p. 169–177.
  • Becker AS, Marcon M, Ghafoor S, et al. Deep learning in mammography: diagnostic accuracy of a multipurpose image analysis software in the detection of breast cancer. Invest Radiol. 2017;52(7):434–440. PubMed PMID: 00004424-201707000-00007.
  • de Oliveira Silva LC, Barros AK, Lopes MV. Detecting masses in dense breast using independent component analysis. Artif Intell Med. 2017;80:29–38.
  • Kooi T, Litjens G, van Ginneken B, et al. Large scale deep learning for computer aided detection of mammographic lesions. Med Image Anal. 2017;35:303–312.
  • Samala RK, Chan H-P, Hadjiiski LM, et al. Multi-task transfer learning deep convolutional neural network: application to computer-aided diagnosis of breast cancer on mammograms. Phys Med Biol. 2017;62(23):8894–8908. PubMed PMID: 29035873.
  • Carneiro G, Nascimento J, Bradley AP. Automated analysis of unregistered multi-view mammograms with deep learning. IEEE Trans Med Imaging. 2017;36(11):2355–2365.
  • Teare P, Fishman M, Benzaquen O, et al. Malignancy detection on mammography using dual deep convolutional neural networks and genetically discovered false color input enhancement. J Digit Imaging. 2017;30(4):499–505. PubMed PMID: 28656455.
  • Dhungel N, Carneiro G, Bradley AP. A deep learning approach for the analysis of masses in mammograms with minimal user intervention. Med Image Anal. 2017 Apr 01;37: 114–128.
  • Sun W, Tseng T-L, Zhang J, et al. Enhancing deep convolutional neural network scheme for breast cancer diagnosis with unlabeled data. Computerized Med Imaging Graphics. 2017;57:4–9.
  • Saraswathi D, Srinivasan E. A CAD system to analyse mammogram images using fully complex-valued relaxation neural network ensembled classifier. J Med Eng Technol. 2014 Oct 01;38(7):359–366.
  • Velikova M, Lucas PJF, Samulski M, et al. On the interplay of machine learning and background knowledge in image interpretation by Bayesian networks. Artif Intell Med. 2013 Jan 01;57(1):73–86.
  • Dheeba J, Tamil Selvi S. An improved decision support system for detection of lesions in mammograms using differential evolution optimized wavelet neural network [journal article]. J Med Syst. 2012 Oct 01;36(5):3223–3232.
  • Dheeba J, Selvi ST. A swarm optimized neural network system for classification of microcalcification in mammograms [journal article]. J Med Syst. 2012 Oct 01;36(5):3051–3061.
  • Parmeggiani D, Avenia N, Sanguinetti A, et al. Artificial intelligence against breast cancer (A.N.N.E.S-B.C.-Project). Ann Ital Chir. 2012 Jan-Feb;83(1):1–5. PubMed PMID: 22352208; eng.
  • Lesniak JM, Hupse R, Blanc R, et al. Comparative evaluation of support vector machine classification for computer aided detection of breast masses in mammography. Phys Med Biol. 2012;57(16):5295.
  • Huang M-L, Hung Y-H, Lee W-M, et al. Usage of case-based reasoning, neural network and adaptive neuro-fuzzy inference system classification techniques in breast cancer dataset classification diagnosis [journal article]. J Med Syst. 2012 Apr 01;36(2):407–414.
  • Ayer T, Alagoz O, Chhatwal J, et al. Breast cancer risk estimation with artificial neural networks revisited: discrimination and calibration. Cancer . 2010;116(14):3310–3321. PubMed PMID: 20564067.
  • Cai H, Peng Y, Ou C, et al. Diagnosis of breast masses from dynamic contrast-enhanced and diffusion-weighted MR: a machine learning approach. PLOS ONE. 2014;9(1):e87387.
  • Chen H-L, Yang B, Wang G, et al. Support vector machine based diagnostic system for breast cancer using swarm intelligence [journal article]. J Med Syst. 2012 Aug 01;36(4):2505–2519.
  • Dheeba J, Albert Singh N, Tamil Selvi S. Computer-aided detection of breast cancer on mammograms: a swarm intelligence optimized wavelet neural network approach. J Biomed Inform. 2014 Jun 01;49: 45–52.
  • Hsieh S-L, Hsieh S-H, Cheng P-H, et al. Design ensemble machine learning model for breast cancer diagnosis [journal article]. J Med Syst. 2012 Oct 01;36(5):2841–2847.
  • Huang M-L, Hung Y-H, Chen W-Y. Neural network classifier with entropy based feature selection on breast cancer diagnosis [journal article]. J Med Syst. 2010 Oct 01;34(5):865–873.
  • Kamra A, Jain VK, Singh S, et al. Characterization of architectural distortion in mammograms based on texture analysis using support vector machine classifier with clinical evaluation. J Digit Imaging. 2016;29(1):104–114. PubMed PMID: 26138756.
  • Liu B, Jiang Y. A multitarget training method for artificial neural network with application to computer-aided diagnosis. Med Phys. 2013;40(1):011908. PubMed PMID: 23298099.
  • Qiu Y, Yan S, Gundreddy RR, et al. A new approach to develop computer-aided diagnosis scheme of breast mass classification using deep learning technology. J Xray Sci Technol. 2017;25(5):751–763. PubMed PMID: 28436410.
  • Seokmin H, Ho-Kyung K, Ja-Yeon J, et al. A deep learning framework for supporting the classification of breast lesions in ultrasound images. Phys Med Biol. 2017;62(19):7714.
  • Tan M, Pu J, Zheng B. Optimization of breast mass classification using sequential forward floating selection (SFFS) and a support vector machine (SVM) model. Int J Comput Assist Radiol Surg. 2014;9(6):1005–1020. PubMed PMID: 24664267.
  • Venkatesh SS, Levenback BJ, Sultan LR, et al. Going beyond a first reader: a machine learning methodology for optimizing cost and performance in breast ultrasound diagnosis. Ultrasound Med Biol. 2015 Dec 01;41(12):3148–3162.
  • Wang J, Yang X, Cai H, et al. Discrimination of breast cancer with microcalcifications on mammography by deep learning [Article]. Sci Rep. 2016 Jun 07;6: 27327. online. Available from: https://www.nature.com/articles/srep27327#supplementary-information
  • Wu W-J, Lin S-W, Moon WK. Combining support vector machine with genetic algorithm to classify ultrasound breast tumor images. Computerized Med Imaging Graphics. 2012 Dec 01;36(8):627–633.
  • Wu W-J, Lin S-W, Moon WK. An artificial immune system-based support vector machine approach for classifying ultrasound breast tumor images [journal article]. J Digit Imaging. 2015 Oct 01;28(5):576–585.
  • Zhang Q, Xiao Y, Dai W, et al. Deep learning based classification of breast tumors with shear-wave elastography. Ultrasonics. 2016;72:150–157.
  • Kooi T, Ginneken B, Karssemeijer N, et al. Discriminating solitary cysts from soft tissue lesions in mammography using a pretrained deep convolutional neural network. Medical Physics. 2017;44(3):1017-1027. doi:10.1002/mp.12110.
  • Houssami N, Miglioretti DL. Digital breast tomosynthesis: a brave new world of mammography screening. JAMA Oncol. 2016;2(6):725–727.

Appendices

Appendix 1. Literature search and study identification strategy – Artificial Intelligence (AI) for breast cancer detection

Appendix 1. Literature search and study identification strategy – Artificial Intelligence (AI) for breast cancer detection

Appendix 2. Database search terms