505
Views
13
CrossRef citations to date
0
Altmetric
Research Paper

Lung cancer-associated auto-antibodies measured using seven amino acid peptides in a diagnostic blood test for lung cancer

, , , &
Pages 267-272 | Published online: 01 Aug 2010

Abstract

Abstract

Autoantibody profiling is a developing approach that incorporates immune recognition of myriad aberrant cancer proteins into a single diagnostic assay. We have previously described methodology to screen T7-phage NSCLC-cDNA libraries for phage-expressed proteins recognized by NSCLC-associated antibodies, and developed a multiplex assay that has excellent ability to discriminate NSCLC from control samples. This follow-up report describes the development and testing of a diagnostic autoantibody assay that uses seven amino-acid peptides as capture proteins. A random-peptide M13-phage library was screened for proteins recognized by cancer-associated antibodies. One hundred twenty-one NSCLC case and control samples were divided into two groups for training and validation, or alternately, evaluated sequentially in a leave-one-out analysis. Candidate antibody-markers were ranked by statistical discrimination between cases and controls. Receiver-Operating-Characteristic (ROC-AUC) suggested the predictive potential of various marker combinations. A five-marker combination (AUC=0.982) afforded 90% sensitivity and 73% specificity in a training-and-testing strategy. Leave-one-out validation provided similar class prediction. Data confirm the potential of antibody profiling to provide high levels of cancer prediction. Random peptide libraries offer a universal source of capture proteins for antibody profiling that obviates the need for tumor-specific library construction and abrogates inherent problems with tumor heterogeneity during biomarker discovery.

Introduction

Blood tests could favorably alter management of non-small cell lung cancer (NSCLC), the most prevalent form of the disease.Citation1Citation3 Several NSCLC-associated proteins can be measured in blood, but limited frequency has precluded their implementation as clinical diagnostic markers.Citation4,Citation5 Measuring circulating antibodies to tumor proteins is an alternative to the traditional approach of measuring circulating proteins by immunoassay.Citation6Citation12 Since the immune system provides signal amplification of low frequency proteins, this strategy can dramatically expand the repertoire of potential diagnostic markers. Our group and others have described methodology for measuring cancer-associated antibodies as markers of disease.Citation13Citation16 Published reports show how capture proteins can be selected from T7 bacteriophage tumor-cDNA libraries that are in turn used to measure putative autoantibody markers. Although most individual markers have modest independent predictive accuracy, combined measurements using a panel of capture proteins has proven ability to discriminate cancer from normal plasma, and autoantibody profiling holds excellent diagnostic potential for a number of cancers.Citation13Citation17

We now describe a modified approach using an M13 phage random peptide library as a source of capture proteins. This report supports the use of autoantibodies as markers of NSCLC, and shows that random peptide libraries are a viable alternative to tumor-specific cDNA libraries, which have, to date, been most widely used as a source of capture proteins for autoantibody profiling.Citation13Citation17 In contrast to conventional biomarker discovery and implementation strategies that focus on a single uniquely predictive marker, the following continues to support the precept that numerous tumor-associated antibodies have independent predictive value, but that the fundamental power of antibody profiling lies in the combination of markers that minimizes the importance of the individual markers.

Results

High throughput screening, marker selection and statistical modeling.

During the initial high throughput phase of library screening, 474/4,000 clones from a biopan enriched peptide library were highly reactive (2 standard deviations from a software-generated linear regression line) with at least one of five NSCLC samples used in screening. These candidate capture proteins and controls were compiled on a single diagnostic chip. One hundred twenty-one case and control samples were divided in two groups: (A) marker selection and statistical training or (B) for independent validation. Replicate chips were used to assay all samples, and standardized residual measurements were calculated for each potential marker. Each of the 474 candidate markers was independently analyzed by t-test for statistically significant differences between 59 cases and controls from ½ the available sample set. Two hundred twenty two of the 474 candidate markers showed statistically significant differences between 31 cases and 28 controls (p < 0.05), 149 of which had p < 0.01; 77 p < 0.001; and 49 of which had p < 0.0001; sequence analysis revealed a very limited rate of redundancy among capture proteins. Thirty-two unique markers with high independent levels of discrimination were further evaluated for independent and combined predictive value determined by Receiver Operating Characteristic (ROC).

Predictive accuracy.

Using a “training and testing” validation strategy, ½ the sample set designated for statistical model training was used as classifiers for class prediction in the second ½, similarly comprised of 32 NSCLC cases (21 advanced 11 early stage), and 30 risk matched controls. Individual markers with the highest AUC were sequentially added in a logistic regression model.

The predictive accuracy of various marker combinations was calculated (). A five-marker combination that generated an AUC 0.982 in the training set provided sensitivity of 90.6% (95% CI: 0.73 to 0.97) and specificity of 73% (95% CI: 0.53 to 0.87), generating a predictive accuracy of 82% in the independent validation set. A six marker set yielded an AUC 1.0; class prediction was not assessed since AUCs of 1.0 in small sample sets using complex statistical models is more an indication of data overfitting than it is of perfect discrimination. Specifically, when the AUC gets close to 1.0, multicollinearity in the explanatory variables causes high variance inflation factors and unstable parameter estimates and predictions. When minor fluctuations in the data are exaggerated what is generally observed is a prediction based on random error or noise instead of the underlying relationship.Citation19,Citation20

To reduce sample size bias, we employed a leave-one-out cross validation model that incorporates measurements from all 121 available case and control samples. Several marker combinations were tested (). The top six markers that afforded perfect discrimination in sample cohort A, generated an AUC of 0.935 in the complete sample set; leave-one-out validation yielded sensitivity 84.1% (95% CI: 0.72 to 0.91) and specificity 79.3% (95% CI: 0.66 to 0.88). Adding markers improved prediction up to a point. A seven marker combination increased AUC to 0.949 and yielded sensitivity 88.8% (95% CI: 0.78 to 0.95) and specificity 84.5% (95% CI: 0.72 to 0.92). Adding up to ten markers did not increase the AUC (0.948), nor improve the predictive accuracy (sensitivity 87.3%; CI 0.76 to 0.94 and specificity 84.5%; CI 0.72 to 0.92). Alternate marker combinations provided very similar levels of prediction. There was no apparent specificity for any histological subtype in this analysis. also shows the relative sensitivity of various marker combinations for samples stratified by stage of disease. Those data indicate superior performance in later stage disease compared to early stage disease (seven markers: stage I = 82.8%; stage II = 90.9%; stage III = 95.6%; stage IV = 100%).

Sequence analysis of phage-expressed proteins.

The nucleotide sequences of five capture proteins are shown in . Although the identity of the phage-expressed proteins is not critical for use in a diagnostic assay, the nucleotide sequences of all of the capture proteins used in the predictive models were compared to the GenBank database to obtain possible identities. Although multiple significant matches were noted for each sequence, the short sequence length precluded confirmation beyond statistical probability.

Discussion

We have previously described methodology to efficiently screen T7-phage NSCLC-cDNA libraries for multiple phage-expressed proteins recognized by antibodies in NSCLC patient plasma.Citation15,Citation16 We showed that phage-expressed proteins could be displayed in an array fashion and used to measure multiple antibodies simultaneously, the combination of which has excellent ability to discriminate NSCLC from control samples.Citation15,Citation16 This report explores an alternative to tumor-specific cDNA library screening as a source of capture proteins. Specifically, we screened a random peptide library with NSCLC plasma, and then used measurements from NSCLC case and high-risk control samples to statistically rank a panel of putative antibody-markers; the AUC generated by logistic regression suggested the predictive potential of various marker combinations. Class prediction performed on an independent set of cases and controls, or as part of leave-one-out analysis, supports published reports of strong predictive accuracy of antibody profiling.Citation13Citation17 The data supports the precept that numerous autoantibodies are present in NSCLC, shows that different marker combinations can provide high levels of prediction, and indicates that the process of antibody profiling for NSCLC could supplant the traditional importance of individual markers. Notably, data indicate that this particular panel of markers offers higher sensitivity for late stage than for early stage disease. This weakness might have resulted from utilization of stage II–IV cancer samples for physical selection during biopan and/or statistical selection during high throughput screening. Assuming this is the case, sensitivity for early stage disease might be improved by utilizing similar samples during discovery steps.

While an ideal test would have both specificity and sensitivity that approaches 100%, perfect prediction is not expected. Similar to blood tests for prostate, breast and colon cancer (PSA and digital rectal exams, mammograms and clinical breast exams, stool guiac and colonoscopy), no single test is likely to be a practical and comprehensive independent modality for lung cancer diagnosis. As such, a blood test with sensitivity of 90% and specificity of 40%, which results in a CXR or CT scan, may be highly useful. The most useful comparison is to PSA with a cutoff of 4 ng/ml (AUC 0.64.0.78) which provides roughly 86% sensitivity and 30% specificity in the target population.Citation21,Citation22 Comparison to imaging techniques such as mammography or chest CT (as a sole modality) is less useful, since predictive accuracy is adjustable only through population selection or screening interval, although cost analysis and availability may ultimately be important. Nonetheless, the facts are instructive: Sensitivity and specificity of a mammogram are 75–90% and 85–95% respectively. Sensitivity of serial annual CT screening for lung cancer is 94%. Sensitivity of a single CT, calculated from the number of missed detection of both benign and malignant lung nodules on prevalence scanning, may be as low as 74%, but is dependent on the prevalence of disease in the target population. Specificity is 64%.Citation23 Given the lack of any other suitable standard, and the severity of the disease sensitivity >90% and specificity >60% is likely to provide high clinical utility. Importantly, since this dynamic prediction model allows sensitivity to be increased by sacrificing specificity (or vice versa), the accepted cutoff for binomial prediction (cancer yes vs. cancer no) may be adjusted for optimal performance. Additional testing will be necessary to construct an optimal marker combination for NSCLC and further validation is required to define the predictive accuracy of this assay more precisely.

Importantly, the multiplex marker approach offers flexibility to accommodate a variety of diagnostic applications and compensate for inherent heterogeneity of NSCLC. Selecting markers for specific cancer characteristics can easily expand the assay and improve predictive accuracy; this flexibility can even be extended to other cancers if alternate plasmas are used for screening. In context, the random peptide library provides a universal pool of capture proteins for marker selection, obviating the need for tumor, stage or histologically specific cDNA library construction. Although the short peptide sequences elude definitive identification of parent proteins being recognized, the accurate epitope mapping that results is an attractive alternative to the daunting task of mapping large phage-expressed capture proteins from cDNA libraries. Definitive knowledge of epitopes may offer a simpler translation from high throughput, phage-based biomarker discovery to multiplex assays for clinical diagnostics. The identification of a large number of unique epitopes and promising levels of cancer prediction shows that the combination of microarray technology and the random peptide library phage-based system is a highly efficient technique for biomarker discovery.

Materials and Methods

Human subjects.

Plasma from 73 individuals with histologically confirmed NSCLC (stage I–IV) and 60 risk matched controls were used in marker selection and analysis. Five of 73 NSCLC and two control plasmas were used for biopanning as described below. Another 5 of the NSCLC plasmas were used for high-throughput screening of phage clones picked after biopanning. The remaining 121 samples were divided into two independent case and control sets (). One half of the available sample set, comprised of 31 NSCLC plasma samples (19 advanced stage, 12 stage I NSCLC) and 28 risk matched controls, was used for marker selection and assay training. The second half, comprised of 32 NSCLC samples (21 advanced stage, 11 stage I NSCLC) and 30 high-risk controls was used for marker validation. Alternately, measurements from all 121 cases and controls were analyzed for predictive accuracy using a leave-one-out validation strategy. Patient characteristics are described in . All but one cancer patient and all controls had >10 pack-year history of smoking. All but one individual is Caucasian, which reflects the demographic of the population from which the samples are derived.

Peptide library.

The disulphide-constrained heptapeptide M13 phage display library (Ph.D.-C7C™ Phage Display Peptide Library) from New England BioLabs, Inc., (Beverly, MA) contains 109 unique seven amino acid peptides. Each phage expresses five copies of the unique peptide in loop structures on the surface of the phage. M13 is a non-lytic filamentous phage grown in ER2738 bacteria (NEB).

Library biopanning.

Biopanning was used to enrich the population of immunogenic proteins specifically recognized by antibodies in NSCLC patient plasma; this step effectively depletes phage-displayed peptides bound by antibodies from normal individuals and positively selects for peptides recognized by antibodies from NSCLC patients. Plasma pooled from two non-cancer controls was used for negative selection, and pooled plasma from five NSCLC patients (stage II–IV; two adenocarcinomas, two squamous cell carcinomas, one undifferentiated NSCLC) was used for positive selection. The manufacturer's (NEB) protocol for solution binding with ProteinA/G capture was used with some modification. This protocol allows the library to be probed with an antibody that is in solution followed by affinity capture of the antibody-phage complexes onto agarose beads (A-3316 agarose beads coated with Fc specific goat anti-human IgG; Sigma; Saint Louis, MO). A solution of 2 × 1011 phage virions was incubated with the pool of control plasma, beads were added and phage particles reacting with antibodies found in the control plasma were removed from solution by centrifugation. Phages remaining in solution were then incubated with pooled patient plasma, beads were added and phage particles reacting with antibodies from NSCLC patients were then collected by centrifugation. Phages bound to NSCLC Ab-coated beads were eluted, titered, amplified and titered again before repeating the biopanning process. Titering after the third biopan revealed that no more phages were being removed by incubation with control plasma; to avoid a high level of redundancy, phages from the second biopan were used to create the arrays described below.

High-throughput screening and generation of a diagnostic chip.

Phage-containing supernatant from the biopanned library was amplified in ER2738 and grown on LB-agar plates covered with 0.6% agarose for isolating individual phage clones. A colony-picking robot (Genetix QPix 2, Hampshire, UK) was used to pick 4,000 individual colonies. The picked phages were re-amplified in 96-well plates and 5 nl of phage-containing supernatant from each well was robotically spotted in duplicate onto FAST array slides (Schleicher and Schuell, Keene, NH) using an Affymetrix 417 Arrayer (Affymetrix, Santa Clara, CA). Replicates of four screening slides were generated; each comprised of 1,000 individual phage clones. Replicate slides were blocked with 4% milk for 1.5 hours then hybridized with five individual NSCLC patient plasma not used in the biopanning process (stage II–IV; two adenocarcinomas, two squamous cell carcinomas, one undifferentiated NSCLC). To reduce antibody binding to non-cancer associated proteins, plasma was pre-absorbed with sonicated ER2738 cultures infected with empty M13 (no insert) at 1:30 overnight at 4°C with rotation. These pre-absorbed plasmas were individually diluted 1:300 in 4% milk and incubated with a screening slide for 1 hr @ RT. Slides were then washed in PBST (0.05% Tween) and incubated with mouse monoclonal antibody to M13 phage coat protein g8p (Abcam ab9225-1; Cambridge, MA) diluted 1:500 in 4% milk for 1 hour at room temperature. Slides were again washed in PBST (0.05% Tween), then hybridized with Cy™3-conjugated AffiniPure goat anti-mouse IgG (115-165-062) and Cy™5-conjugated AffiniPure F(ab')2 fragment goat anti-human IgG (109-176-088) antibodies (each from Jackson ImmunoResearch Laboratories, Inc.). Cy5 and Cy3 signal at each spot location was quantified using an Affymetrix 417 Scanner. Images were analyzed using GenePix 5.0 software (Axon Instruments, Union City, CA). The amount of human antibody binding to each phage clone was quantified by Cy5 signal (antihuman secondary antibody) normalized to Cy3 signal (the amount of protein on each spot). Markers on each of the four screening slides with a Cy5/Cy3 signal ratio greater than 2 standard deviations from a software-generated linear regression were selected for further evaluation. These 474 clones were compiled, re-amplified and spotted in duplicate onto FAST slides as single diagnostic chips.

Sample analysis.

Available samples were divided into two independent case and control sets (Set A and B; ). All samples were assayed on replicate chips using an identical protocol to that described above for screening. The antibody reactivity of each sample with each of the 474 phage-expressed proteins identified in screening above was expressed as a “standardized residual” (distance from the regression line divided by the residual standard deviation). This normalization afforded a reliable measure of the amount of antibody binding to each unique phage-expressed protein relative to the amount of protein on each spot. This methodology is quantitative, reproducible, and compensates for chip-to-chip variability, allowing comparison between samples.

Standardized residual measurements for each putative marker were independently analyzed by t-test for statistically significant differences between 31 case and 28 control samples of sample cohort A. Putative markers were ranked for independent predictive value by p value. The most predictive individual markers were checked for redundancy by PCR amplification using commercial M13-phage vector primers (NEB). Redundant clones were eliminated. Receiver operating characteristics (ROC) curves were generated for each marker with significance of p < 0.0001 between 59 cases and controls in sample cohort A. Markers were ranked by individual predictive potential using Area Under the ROC Curve (AUC), and sequentially combined in a logistic regression model; markers were added until a “threshold” was reached beyond which improvement in AUC could not be demonstrated. The predictive accuracy of each marker combination was then assessed on the independent sample Set B, comprised of 32 case and 30 control samples ().

The entire sample set (121 case and control samples from cohorts A and B combined) was also tested in a leave-one-out validation.Citation18 The leave-one-out method removes the measurements of individual samples one at a time and uses the remaining 120 samples as classifiers to predict the identity of the one left out; misclassification of cases or controls is used to calculate sensitivity and specificity of the assay. A number of marker combinations were tested, including the six marker set that provided perfect discrimination in half the sample set, and combinations of the top seven and top ten ranked markers (). Relative sensitivity of various marker combinations for samples stratified by stage of disease was also determined within this analysis ().

DNA sequence analysis and protein identification.

DNA sequence analysis was used to eliminate redundant phages from the assay. Redundancy in selected candidate phages was <4%. Sequence homology to known proteins was examined based on significant nucleotide and translated nucleotide matches (bit score, e-value and percent sequence match) with a single gene in the GenBank database using BLASTN and BLASTX search engines. For essentially all seven amino acid capture proteins, the short sequence length prevented protein identification beyond marginal degrees of statistical certainty. Amino acid sequences for the five highest-ranking markers are included in .

Abbreviations

NSCLC=

non-small cell lung cancer

ROC=

receiver-operating-characteristic

AUC=

area under the curve

Figures and Tables

Table 1 Patient characteristics

Table 2 Sequential marker combination

Table 3 Sequential addition of markers

Acknowledgements

These studies were supported by NIH R01 # CA10032-01 the Veteran's Administration Merit Review Program and the Kentucky Lung Cancer Research Association. We would like to thank Aaron Bungum and Keith Dickerson for their help with sample inventory and management.

References

  • Hoffman PC, Mauer Am, Vokes EE. Lung Cancer. Lancet 2000; 355:579 - 585
  • Iyengar P, Tsao MS. Clinical relevance of molecular markers in lung cancer. Surg Oncol 2002; 11:167 - 179
  • Conrads TP, Hood BL, Petricoin EF 3rd, Liotta LA, Veenstra TD. Cancer proteomics: many technologies, one goal. Expert Rev Proteomics 2005; 2:693 - 703
  • Niklinski J, Furman M. Clinical tumour markers in lung cancer. Eur J Cancer Prev 1995; 4:129 - 138
  • Ferrigno D, Buccheri G, Biggi A. Serum tumour markers in lung cancer: history, biology and clinical applications. Eur Respir J 1994; 7:186 - 197
  • Buccheri G, Violante B, Sartoris AM, Ferrigno D, Curcio A. Clinical value of a multiple biomarker assay in patients with bronchogenic carcinoma. Cancer 2001; 34:65 - 69
  • Lombardi C, Tassi GF, Pizzocolo G, Donato F. Clinical significance of a multiple biomarker assay in patients with lung cancer. A study with logistic regression analysis. Chest 1990; 97:639 - 644
  • Abu-Shakra M, Buskila D, Ehrenfeld M, Conrad K, Shoenfeld Y. Cancer and autoimmunity: autoimmune and rheumatic features in patients with malignancies. Ann Rheum Dis 2001; 60:433 - 441
  • Stockert E, Jager E, Chen YT, Scanlan MJ, Gout I, Karbach J, et al. A survey of the humoral immune response of cancer patients to a panel of human tumor antigens 1998; 187:1349 - 1354
  • Sioud M, Hansen MH. Profiling the immune response in patients with breast cancer by phage-displayed cDNA libraries. Eur J Immunol 2001; 31:716 - 725
  • Robinson WH, DiGennaro C, Hueber W, Haab BB, Kamachi M, Dean EJ, et al. Autoantigen microarrays for multiplex characterization of autoantibody responses. Nat Med 2002; 8:295 - 301
  • Tureci O, Sahin U, Zwick C, Neumann F, Pfreundschuh M. Exploitation of the antibody repertoire of cancer patients for the identification of human tumor antigens. Hybridoma 1999; 18:23 - 28
  • Wang X, Yu J, Sreekumar A, Varambally S, Shen R, Giacherio D, et al. Autoantibody signatures in prostate cancer. N Engl J Med 2005; 353:1224 - 1235
  • Chatterjee M, Mohapatra S, Ionan A, Bawa G, Ali-Fehmi R, Wang X, et al. Diagnostic markers of ovarian cancer by high-throughput antigen cloning and detection on arrays. Cancer Res 2006; 66:1181 - 1190
  • Zhong L, Hidalgo GE, Stromberg AJ, Khattar NH, Jett JR, Hirschowitz EA. Using protein microarray as a diagnostic assay for non-small cell lung cancer. Am J Respir Crit Care Med 2005; 172:1308 - 1314
  • Zhong L, Coe SP, Stromberg AJ, Khattar NH, Jett JR, Hirschowitz EA. Profiling Tumor-Associated Antibodies for Early Detection of Non-Small Cell Lung Cancer. J Thoracic Oncol 2006; 1:513 - 519
  • Chen G, Wang X, Yu J, Varambally S, Yu J, Thomas DG, et al. Autoantibody profiles reveal ubiquilin 1 as a humoral immune response target in lung adenocarcinoma. Cancer Res 2007; 67:3461 - 3467
  • Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: a comparison of resampling methods. Bioinformatics 2005; 21:3301 - 3307
  • Steyerberg EW, Eijkemans MJ, Harrell FE Jr, Habbema JD. Prognostic modeling with logistic regression analysis: in search of a sensible strategy in small data sets. Med Decis Making 2001; 21:45 - 56
  • Katz MH. Multivariable analysis: a primer for readers of medical research. Ann Intern Med 2003; 138:644 - 650
  • Tanguay S, Begin LR, Elhilali MM, Behlouli H, Karakiewicz PI, Aprikian AG. Comparative evaluation of total PSA, free/total PSA, and complexed PSA in prostate cancer detection. Urology 2002; 59:261 - 265
  • Ozdal OL, Aprikian AG, Begin LR, Behlouli H, Tanguay S. Comparative evaluation of various prostate specific antigen ratios for the early detection of prostate cancer. Br J Urol 2004; 93:970 - 974
  • Swensen SJ, Jett JR, Sloan JA, Midthun DE, Hartman TE, Sykes AM, et al. Screening for Lung Cancer with Low-Dose Spiral Computed Tomography. Am J Respir Crit Care Med 2002; 165:508 - 513

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.