2,452
Views
36
CrossRef citations to date
0
Altmetric
ORIGINAL ARTICLES

Standardized assessment of tumor-infiltrating lymphocytes in breast cancer: an evaluation of inter-observer agreement between pathologists

, , , , , , , , & show all
Pages 90-94 | Received 15 Sep 2017, Accepted 01 Nov 2017, Published online: 23 Nov 2017

Abstract

Introduction: In breast cancer, there is a growing body of evidence that tumor-infiltrating lymphocytes (TILs) may have clinical utility and may be able to direct clinical decisions for subgroups of patients. Clinical utility is, however, not sufficient for warranting the implementation of a new biomarker in the routine practice, and evaluation of the analytical validity is needed, including testing the reproducibility of decentralized assessment of TILs. The aim of this study was to evaluate the inter-observer agreement of TILs assessment using a standardized method, as proposed by the International TILs Working Group 2014, applied to a cohort of breast cancers reflecting an average breast cancer population.

Material and methods: Stromal TILs were assessed using full slide sections from 124 breast cancers with varying histology, malignancy grade and ER- and HER2 status. TILs were estimated by nine dedicated breast pathologists using scanned hematoxylin–eosin stainings. TILs results were categorized using various cutoffs, and the inter-observer agreement was evaluated using the intraclass coefficient (ICC), Kappa statistics as well as individual overall agreements with the median value of TILs.

Results: Evaluation of TILs led to an ICC of 0.71 (95% CI: 0.65–0.77) corresponding to an acceptable agreement. Kappa values were in the range of 0.38–0.46 corresponding to a fair to moderate agreement. The individual agreements increased, when using only two categories (‘high’ vs. ‘low’ TILs) and a cutoff of 50–60%.

Discussion: The results of the present study are in accordance with previous studies, and shows that the proposed methodology for standardized evaluation of TILs renders an acceptable inter-observer agreement. The findings, however, indicate that assessment of TILs needs further refinement, and is in support of the latest St. Gallen Consensus, that routine reporting of TILs for early breast cancer is not ready for implementation in a clinical setting.

Introduction

Varying presence of an inflammatory response is observed in virtually all neoplasms. Since the 1940s, it has been speculated that the inflammatory response in breast cancer could be associated with prognostic significance, based on the observation of how medullary carcinomas, often accompanied by a well-pronounced lymphocytic infiltrate, seemed to be associated with an extremely good prognosis despite an otherwise low differentiated appearance [Citation1]. Later, lymphocytic infiltration has been described as a prognostic variable in more frequent histological types of breast cancer [Citation2].

In comparison to e.g., malignant melanoma and lung cancer, breast cancer is considered non-immunogenic, but especially triple negative (TNBC) and HER-positive cancers have been shown to have higher levels of tumor-infiltrating lymphocytes (TILs) [Citation3], and a clinical impact of TILs in breast cancers has been shown to be particularly evident in these subtypes. The prognostic impact of TILs has been proven in thousands of TNBC [Citation3–5] and HER2 positive cancers [Citation6], and a linear relationship has been described for TNBC and HER2 positive cancers with a 15–20% reduction in distant metastasis rate and mortality for each 10% increase in TILs [Citation5]. A predictive impact have also been described, e.g., in the FinHer study, where patients with HER2 positive tumors with high TILs levels were found to derive greater benefit from Trastuzumab than patients with low TILs levels; thus indicating a positive predictive impact of a preexisting host anti-tumor immune response [Citation7]. Finally, high level of TILs has been associated with higher rates of pathological complete response (pCR) after neoadjuvant chemotherapy [Citation8,Citation9].

There is as such a growing body of evidence that TILs have clinical validity and utility in breast cancer and may be able to direct clinical decisions for a group of patients. Clinical utility is, however, not sufficient for warranting the implementation of a new biomarker in the clinical practice, and a thorough evaluation of the analytical validity is needed in order to describe the robustness of the test including accuracy and inter-observer reproducibility [Citation10].

The International TILs Working Group 2014 has formulated recommendations for evaluation of TILs in breast cancer [Citation11], and the inter-observer agreement between pathologists in TILs have been examined for TNBC and HER2 positive cancers [Citation12,Citation13].

The purpose of this study was to evaluate the inter-observer agreement of TILs assessment using the standardized method as proposed by the International TILs Working Group 2014 [Citation11] applied to a cohort of breast cancers reflecting an average breast cancer population, including estrogen receptor (ER) positive and HER2 negative cancers.

Material and methods

Stromal tumor infiltrating lymphocytes were assessed using full slide sections from a total number of 124 breast cancers. The distribution of ER-status, HER2-status, malignancy grade and histological type of the 124 carcinomas was comparable to an average Caucasian population () [Citation14].

Table 1. Histopathological tumor characteristics.

From each formalin fixed, paraffin embedded (FFPE) block, one full slide, HE-stained section (3 µm thick) were scanned at 20× magnification using a Nanozoomer 2.0HT (Hamamatsu Photonics K.K., Hamamatsu City, Japan), and the scanned sections were uploaded and made available for analysis using the free downloadable program NDP.view2 by Hamamatsu. Digitalized HE sections were chosen in order to facilitate distribution of the material, and evaluation of the scanned HE-sections was considered comparable to evaluation of the actual glass slides by light-microscopy. The evaluation was based on the recommendations from the TILs working group 2014 [Citation11]. In short, TILs encompass both lymphocytes and plasma cells. According to the international recommendations, only stromal TILs within the borders of the invasive tumor were evaluated, and areas of crush artifacts, necrosis, and previous core biopsy sites were excluded. The estimation was semiquantitative, assessing an average TILs score in the tumor area for the full section, with no evaluation of hotspots. TILs were recorded as a continuous variable, and hereafter, the amount of TILs was categorized into various 2- or 3 grade categorizations using different cut offs (A: 0–10%, 11–39%, ≥40%; B: 0–20%, 21–49%, ≥50%; C: <50% vs. ≥50% or D: <60% vs. ≥60%). Each TILs value describes the ‘area of stromal tissue occupied by TILs/total area of intra-tumoral stromal area’.

Nine dedicated breast pathologists evaluated the sections independently, and the results from each pathologist were kept confident with no individual feedback provided to the participants.

Statistical analysis

Inter-observer agreement for the evaluation of TILs reported as a continuous score was assessed via the intraclass coefficient (ICC) calculated using a mixed model. Fleiss kappa values (adaptation of Cohen’s kappa for 3 or more raters) were used for assessing the inter-observer agreement, when evaluating TILs as categorical data [Citation15]. Missing values were replaced by the mean of the measurements for the sample for calculation of Kappa values and ICC. Finally, a concordance analysis was performed by calculating for each pathologist the overall agreement with the median rating (number of samples in agreement/total number of samples), and this was repeated for the four different groupings of TILs (A: 0–10%, 11–39%, ≥40%; B: 0–20%, 21–49%, ≥50%; C: <50% vs. ≥50% or D: <60% vs. ≥60%). The kappa values is a commonly used method for assessing interrater agreement, and were designed to take account of the possibility of guessing [Citation16], but the values are difficult to interpret. Similarly, no standard values for an acceptable agreement for ICC exist, but a statistically significant ICC of 0.70 has previously been used as an endpoint for a successful evaluation of TILs [Citation12,Citation17]. Percent agreement is a basic measurement of inter-observer agreement, where the effect of chance in achieving agreement between raters is not accounted for. It was decided to report the different measures of inter-observer agreement in order to clarify the inter-observer variability from different angles. Statistical analysis was performed using STATA-version 11.2 (StataCorp, College Station, TX).

Results

The 124 cases evaluated by nine pathologists led to 1107 observations with nine missing values. The mean TILs values for each observer ranged from 10 to 23%, and some observers seemed to have a higher individual threshold or scaling (Supplementary Figure 1 shows the mean and standard deviation of the TILs values for each observer). However, single outliers also contributed to the inter-observer variability. shows how the observers occasionally reported a high or a low value in opposition to the rest of the observers, with a TILs value differing substantially from the mean value for that particular sample. illustrates three different tumors with varying mean values of TILs.

Figure 1. (A) Heat map showing graphically the individual recordings of TILs sorted in columns from left to right according to ascending mean stromal TILs values shown in the top row. The rows underneath represents the nine observers recordings arranged from top to bottom according to increasing individual mean values. Single outliers are represented by, e.g., red pixels among otherwise green or yellow pixels or vice versa. (B) A carcinoma with very low mean TILs level (0.03), (C) a carcinoma with a mean TILs level around the 50–60% cut off (0.56) and (D) a carcinoma with a high mean TILs level (0.83). The black bars in (B–D) measures 200 µm (HE, original magnification 100×).

Figure 1. (A) Heat map showing graphically the individual recordings of TILs sorted in columns from left to right according to ascending mean stromal TILs values shown in the top row. The rows underneath represents the nine observers recordings arranged from top to bottom according to increasing individual mean values. Single outliers are represented by, e.g., red pixels among otherwise green or yellow pixels or vice versa. (B) A carcinoma with very low mean TILs level (0.03), (C) a carcinoma with a mean TILs level around the 50–60% cut off (0.56) and (D) a carcinoma with a high mean TILs level (0.83). The black bars in (B–D) measures 200 µm (HE, original magnification 100×).

The ICC was 0.71 (95% CI: 0.65–0.77) (). The ICC describes to which degree the variance in the measurements can be attributed to the actual biological differences in comparison to the variance delivered by the fact that different pathologist rates differently. Interpretation of the obtained ICC thus means that 71% of the variance in the present results can be attributed to variance caused by inter-tumoral differences, but the remaining 29% will be due to artifacts attributable to inter-observer variability. The pre-specified endpoint of a statistically significant ICC was not completely met, since the 95% confidence interval included 0.70.

Table 2. Interobserver agreement in assessment of tumor-infiltrating lymphocytes (TILs).

Kappa values for the different categories of TILs (A: 0–10%, 11–39%, ≥40%; B: 0–20%, 21–49%, ≥50%; C: <50% vs. ≥50% or D: <60% vs. ≥60%) were 0.41, 0.36, 0.48 and 0.44, respectively (). This corresponds to a fair to moderate agreement according to the criteria by Landis and Koch [Citation18].

Furthermore, the overall agreement between each observer’s recordings compared to the median value for the samples was calculated for each of the different categorizations. The mean value of these overall agreements is listed in . The agreements were poorest, when dividing the TILs into 3-grade categories (0.79 and 0.82, respectively), whereas the overall agreement for each observer increased substantially when using only two categories (0.93 and 0.95, respectively).

For 118/124, both ER and HER2 status were available (). For all combinations of ER and HER2, samples with high as well as low TILs values were found, though ER+/HER2– were found to have a lower mean TILs value than the other subtypes with 95% confidence intervals only overlapping with ER+/HER2+ samples (Supplementary Table 1).

Discussion

The results of our reproducibility study are in accordance with two previous studies [Citation12,Citation13], and shows that the internationally proposed methodology for standardized evaluation of TILs [Citation11] renders an acceptable agreement among observers.

Recently, the International Immuno-oncology Biomarker Working Group conducted two ring studies with the purpose of evaluating the inter-observer agreement for decentralized assessment of TILs in a total number of 120 HER2-positive and TN breast cancers [Citation12]. The pre-specified endpoint (ICC > 0.7) was not reached with statistical significance (ICC: 0.70; 95% CI: 0.62–0.78), since the 95% confidence interval did indeed include 0.70, but the agreement was found to be relatively good. A similar acceptable ICC of 0.62 was found in a smaller series of 75 TNBC [Citation13]. The presented study is the first to evaluate the standardized assessment of TILs in a broader range of breast cancer also encompassing ER+/HER2-cancers, and shows an ICC and Kappa values similar to the previous findings.

The inter-observer variability can be attributed to different issues as, e.g., different individual thresholds/scaling differences as could be seen in Supplementary Figure 1. Intra-tumoral heterogeneity may also contribute to variations in the recordings, and may especially contribute to presence of single outliers. According to the international standardized guidelines, it is attempted to evaluate the average TILs level on a full slide section and not concentrate on hot spots. It is, however, highly likely that the eye is drawn towards hot spots, which may affect the registration. Lymphocytic infiltration within a normal lobule or around areas of ductal carcinoma in situ within or outside the tumor-area should not be included in the assessment of TILs, but may erroneously lead to overestimation of TILs levels. Incorrect registrations may lead to single outliers, and finally, various tissue components (apoptosis, individual cell necrosis and stromal fibroblasts, etc.) may be misinterpreted as TILs. Though the use of immunohistochemical (IHC) stainings may assist in discriminating between intra-tumoral lymphocytes and other tissue components, it is at present not recommended to include IHC in the assessment.

A more thorough discrimination of the specific subpopulations of TILs and ratio between the various inflammatory cell types using IHC or gene expression profiling may prove to have clinical implications in terms of prognosis and predictive value regarding immune-modulating therapy [Citation19], but evaluation of this aspect was not within the scope of this study.

The use of digital analysis to optimize the evaluation of TILs is at present not recommended. It is, however, highly likely that practice as well as machine learning algorithms may improve the inter-observer agreement. This was shown in the 2nd ring-study by Denkert et al. [Citation12], where the reproducibility improved after the introduction of a specifically designed software program guiding the pathologist to evaluation in predefined screening areas and returning immediate feedback for each TILs value entered in the system.

The use of digitalized HE-sections could, on the other hand, be regarded as a weakness in this study, and may have introduced variations in the evaluation due to factors related to each observer’s availability of suitable IT-solutions (resolution of the computer-screen, speed of the internet, etc.). Furthermore, the quality of each scanned slide may also have affected the evaluation in a negative way.

In this study, the concordance for each observer was found to improve substantially, when using a single cut off (either 50% or 60%), indicating that separation between tumors with ‘low’ vs. ‘high’ level of TILs may be more reproducible and safer to use in a daily setting. A 50–60% cut off has been used in several studies, and tumors with high levels of TILs have been designated ‘lymphocytic predominant’ [Citation3,Citation4,Citation7]. The use of a single cut off could perhaps facilitate the implementation of TILs assessment in a routine setting. A weakness of this study is that the vast majority of the tumors had very low levels of TILs. It would have been preferable to have a higher number of tumors with TILs levels around the 50–60% cut off level.

The results of this study finally showed that ER+/HER2 negative tumors had lower mean TILs levels than HER2 + (and ER−/HER) tumors, and this is in accordance with other studies [Citation3,Citation20]. However, high levels of TILS (>60%) were found even among ER+/HER2– tumors, indicating that some ER+/HER2– tumors may also be considered immunogenic.

A prerequisite for introducing a new biomarker into the daily clinical practice is that the test – besides being sensitive, specific and reproducible – is robust and preferably as non-laborious as possible. Considering this, the evaluation of TILs on HE-section is pragmatic, cost-effective and easy to implement. One of the strengths of this study is that it is performed on full slide sections, and that the participating observers represents pathologists situated nationwide and with various years of experience. The results as such reflect the variability that can be expected, when performing a decentralized assessment of TILs in a representative cohort of breast cancers.

In conclusion, the results of the present study are in accordance with previous studies, and shows that the proposed methodology for standardized evaluation of TILs renders an acceptable inter-observer agreement. The agreement increased when dichotomizing the tumors into samples with ‘high’ or ‘low’ levels of TILs. The findings, however, indicate that the assessment needs further refinement and support the latest St. Gallen Consensus [Citation21] that routine reporting of TILs for early breast cancer is not ready for implementation in the daily clinical setting.

Supplemental material

Trine_et_al._Supplementary_material.zip

Download Zip (15.3 KB)

Disclosure statement

TT has filed a patent for a gene signature associated with efficacy of radiotherapy in breast cancer (International Patent Publication No. WO 2013/132354A2). The patent is not related to the present work.

AVL, TPT and TT have received royalties from Roche A/S for lectures given.

No potential conflicts of interest were disclosed by the other authors.

References

  • Moore OS, Jr, Foote FW, Jr. The relatively favorable prognosis of medullary carcinoma of the breast. Cancer. 1949;2:635–642.
  • Aaltomaa S, Lipponen P, Eskelinen M, et al. Lymphocyte infiltrates as a prognostic variable in female breast cancer. Eur J Cancer. 1992;28A:859–864.
  • Loi S, Sirtaine N, Piette F, et al. Prognostic and predictive value of tumor-infiltrating lymphocytes in a phase III randomized adjuvant breast cancer trial in node-positive breast cancer comparing the addition of docetaxel to doxorubicin with doxorubicin-based chemotherapy: BIG 02-98. J Clin Oncol. 2013;31:860–867.
  • Adams S, Gray RJ, Demaria S, et al. Prognostic value of tumor-infiltrating lymphocytes in triple-negative breast cancers from two phase III randomized adjuvant breast cancer trials: ECOG 2197 and ECOG 1199. J Clin Oncol. 2014;32:2959–2966.
  • Ibrahim EM, Al-Foheidi ME, Al-Mansour MM, et al. The prognostic value of tumor-infiltrating lymphocytes in triple-negative breast cancer: a meta-analysis. Breast Cancer Res Treat. 2014;148:467–476.
  • Ali HR, Provenzano E, Dawson SJ, et al. Association between CD8+ T-cell infiltration and breast cancer survival in 12,439 patients. Ann Oncol. 2014;25:1536–1543.
  • Loi S, Michiels S, Salgado R, et al. Tumor infiltrating lymphocytes are prognostic in triple negative breast cancer and predictive for trastuzumab benefit in early breast cancer: results from the FinHER trial. Ann Oncol. 2014;25:1544–1550.
  • Denkert C, Loibl S, Noske A, et al. Tumor-associated lymphocytes as an independent predictor of response to neoadjuvant chemotherapy in breast cancer. J Clin Oncol. 2010;28:105–113.
  • Salgado R, Denkert C, Campbell C, et al. Tumor-infiltrating lymphocytes and associations with pathological complete response and event-free survival in HER2-positive early-stage breast cancer treated with lapatinib and trastuzumab: a secondary analysis of the NeoALTTO trial. JAMA Oncol. 2015;1:448–454.
  • Wein L, Savas P, Luen SJ, et al. Clinical validity and utility of tumor-infiltrating lymphocytes in routine clinical practice for breast cancer patients: current and future directions. Front Oncol. 2017;7:156.
  • Salgado R, Denkert C, Demaria S, et al. The evaluation of tumor-infiltrating lymphocytes (TILs) in breast cancer: recommendations by an International TILs Working Group 2014. Ann Oncol. 2015;26:259–271.
  • Denkert C, Wienert S, Poterie A, et al. Standardized evaluation of tumor-infiltrating lymphocytes in breast cancer: results of the ring studies of the international immuno-oncology biomarker working group. Mod Pathol. 2016;29:1155–1164.
  • Swisher SK, Wu Y, Castaneda CA, et al. Interobserver agreement between pathologists assessing tumor-infiltrating lymphocytes (TILs) in breast cancer using methodology proposed by the International TILs Working Group. Ann Surg Oncol. 2016;23:2242–2248.
  • O'Brien KM, Cole SR, Tse CK, et al. Intrinsic breast tumor subtypes, race, and long-term survival in the Carolina Breast Cancer Study. Clin Cancer Res. 2010;16:6100–6110.
  • Gisev N, Bell JS, Chen TF. Interrater agreement and interrater reliability: key concepts, approaches, and applications. Res Social Adm Pharm. 2013;9:330–338.
  • Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull. 1968;70:213–220.
  • Denkert C, von Minckwitz G, Brase JC, et al. Tumor-infiltrating lymphocytes and response to neoadjuvant chemotherapy with or without carboplatin in human epidermal growth factor receptor 2-positive and triple-negative primary breast cancers. J Clin Oncol. 2015;33:983–991.
  • Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–174.
  • Savas P, Salgado R, Denkert C, et al. Clinical relevance of host immunity in breast cancer: from TILs to the clinic. Nat Rev Clin Oncol. 2016;13:228–241.
  • Stanton SE, Adams S, Disis ML. Variation in the incidence and magnitude of tumor-infiltrating lymphocytes in breast cancer subtypes. JAMA Oncol. 2016;2:1354–1360.
  • Curigliano G, Burstein HJ, P Winer E, et al. De-escalating and escalating treatments for early-stage breast cancer: the St. Gallen International Expert Consensus Conference on the Primary Therapy of Early Breast Cancer 2017. Ann Oncol. 2017;28:1700–1712.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.