354
Views
0
CrossRef citations to date
0
Altmetric
Editorial

Bringing protein biomarker discovery to the clinic

&
Pages 305-307 | Published online: 09 Jan 2014

This era is a very exciting time to be a life scientist. This opinion is primarily based on the technologies that enable information on hundreds of proteins (or genes, transcripts and metabolites) to be collected nearly simultaneously. Contrast this to the sequencing of the first protein, insulin, by Fred Sanger Citation[1]. Complete sequencing of the A- and B-chain took years and the achievement was considered to be so monumental that Sanger was awarded the Nobel Prize in 1958 for this accomplishment. Nowadays, most protein sequencing is accomplished through mass spectrometry (MS) and can be accomplished in a matter of hours or, at most, a few days. Another major difference in protein science today compared with a few years ago is the density of data acquired for specific proteins. In the past, scientists may spend several months or years studying the intricate details of a single protein before moving onto the next protein of interest. This paradigm represented a high data density per protein. The new coexisting paradigm collects low-density data per protein. For instance, global proteomic comparison studies using MS are primarily based on digesting complex samples (such as cell lysates) into peptides, and then attempting to identify as many peptides as possible using multidimensional fractionation coupled with tandem MS (MS-MS) analysis Citation[2]. The final conclusion reporting the relative quantity of proteins within the comparative samples is often based on data obtained from a single peptide acting as a surrogate for the whole protein. Using this ‘low data density per protein’ model, information on thousands of proteins can be acquired in only a few days.

This low data density model is the one that is generally followed in the search for proteomic biomarkers of disease. In a simplified scenario, a complex proteome is digested into tryptic peptides (which increases the complexity of the mixture by an estimated two orders of magnitude) and separated using liquid chromatography (LC) prior to injection into a mass spectrometer. The mass spectrometer then records as many signals as possible operating in a data-dependent MS-MS mode to aid peptide identification. The relative quantities of the proteins between the comparative samples are then calculated by comparison of the peak areas of corresponding signals. This general approach is a very attractive method for biomarker discovery because of its inherent simplicity. It requires minimal sample preparation and most MS laboratories have the capability of measuring the signals of thousands of proteins within complex mixtures. Even with this data-gathering capability, however, the ability to find biomarkers that laboratories are willing to spend considerable resources validating is still a formidable task.

Selecting the correct starting materials & purpose

The correct starting materials need to be selected when attempting to discover a clinically useful proteomic biomarker. The first choice is obvious – the disease of interest and what role the biomarker will serve in the course of this disease. For illustrative purposes, an investigator interested in finding a biomarker related to renal cell carcinoma (RCC) could be considered. RCC is the most common type of kidney cancer in adults and arises within the proximal renal tubule Citation[3]. This cancer is detected most often during noninvasive imaging that is used to evaluate nonspecific symptoms. Tumors detected in this fashion tend to be smaller, earlier-stage tumors than those that are found in patients that exhibit RCC-related symptoms, such as paraneoplastic disorders (e.g., hypertension, anemia and abnormal liver function or pain or mass related to metastatic disease). Almost 60% of RCCs are diagnosed while the disease is still localized Citation[3]. Unfortunately, RCC is notoriously chemoresistant. What would benefit RCC patients is a biomarker that would predict therapy response, enabling the most effective therapy to be selected first. Ovarian cancer, however, has a different pressing need. This disease affects over 22,000 women in the USA each year. When found at an early stage, this cancer is treatable; more than 90% of women diagnosed with ovarian cancer prior to its spreading beyond the ovary live for at least 5 years after detection. Unfortunately, less than 20% of ovarian cancers are diagnosed at this early stage. Therefore, the great need for ovarian cancer patients is a biomarker that can diagnose early-stage disease.

The next choice is the matrix that will be used to find the desired biomarker. This choice requires consideration of the downstream goal, which is to discover a biomarker that can ultimately be measured within a clinical laboratory using an easily obtained sample. The obvious choices for this purpose would be serum/plasma or urine. These biofluids are easily obtained and are historically known to contain a lot of diagnostic information. Serum/plasma is also very high in protein content so detection of potential protein biomarkers will not be difficult. Unfortunately, there are a number of difficulties associated with attempting to find a biomarker within any biofluid. While a large number of differentially abundant proteins can be identified when comparing biofluids from RCC-affected and healthy individuals, it is difficult to conclusively determine those that emanate from the tumor. Determining the origin of a biofluid-based biomarker is difficult. The concentration of any useful biomarker that does originate directly from the tumor would be quite low at the site of biofluid draw compared with its local concentration at the site of the tumor.

Attempting to discover a biomarker at the site of the tumor itself seems attractive. The concentration of a useful biomarker would be highest at this site and, by comparison to nearby healthy tissue, its specificity to the tumor can be readily established. Studies can be performed to determine the functional role of the biomarker in RCC development within the context of the tumor itself. Unfortunately, tissue is not a useful sample for early cancer diagnosis as it cannot be routinely acquired at yearly physicals, for instance. One solution is to work with both materials. This dual approach would compare the proteomes of RCC tumors with nearby healthy tissue to initially identify proteins that are unique to the tumor. The next study would compare serum/plasma from RCC patients to that obtained from healthy controls. The goal is to find proteins that are both unique to the tumor and, also, to the serum obtained from the RCC-affected individual. While this approach increases the experimental time required, it combines all of the advantages of studying tissue and biofluids. The ultimate result would be a biomarker that is unique to the tumor but can be diagnosed through an easily acquired biofluid.

Using the right tools

Once the proper materials are recognized, the next step is gathering the proper tools to achieve the desired goal. Proteomics is now an extremely broad field that encompasses numerous highly specialized technologies. Many of these technologies have a number of different possible available selections that can be applied to a biomarker discovery study. For example, there are several different mass spectrometers that can be used for observing and/or identifying proteins in complex proteomes Citation[4]. Not all of them are good choices for biomarker discovery. For example, the general strategy of finding a biomarker by comparing as many proteins as possible between samples acquired from healthy and disease-affected individuals requires a mass spectrometer with high dynamic range, high sensitivity, tandem MS capabilities and fast duty cycle (for selection and fragmentation of individual peptides). All of these parameters are characteristic of high performance, modern ion-trap mass spectrometers. While TOF instruments equipped with MALDI or SELDI sources have been used in the past, this configuration lacks the ability to actually identify the species that it measures, and unless significant off-line fractionation is performed, only detects the highest abundant proteins within a complex proteome (the concentration region where biomarkers are not anticipated to reside).

The next tool to consider is sample preparation and fractionation. Proteomes from biofluids and tissues are far too complicated to be infused directly into a mass spectrometer, and high-abundance proteins in serum and plasma limit the ability to identify low-abundance proteins. The strategy most commonly used is ‘divide and conquer’ (i.e., separate the proteins and infuse as few as possible into the instrument at one time to allow the mass spectrometer time to select and attempt to identify as many species as possible). There are many options for fractionating complex proteomes. The fractionation choice is often dictated by throughput and compatibility with other steps in the study. For example, if your choice of mass spectrometer is an ion trap equipped with an electrospray ionization (ESI) source (a good choice), the chromatography used to separate the proteins just prior to them entering the instrument must use a volatile solvent that is compatible with ESI. This fact is why reverse-phase chromatography is chosen almost universally. A decision must also be made as to how many fractionation tools should be used? A popular choice in proteomics is to separate the proteome into fractions using strong cation exchange first and then analyze each of these fractions by reverse-phase LC coupled on-line with the mass spectrometer. This combination, developed and coined as MudPIT, allows more than 3000 proteins to be identified in a time frame of a couple of days Citation[5]. Regardless of the fractionation scheme chosen, biomarker discovery via LC-MS-MS analysis of biofluids is a low-throughput technology, requiring several days per sample.

Conclusion

Even with our ability to make wise choices and identify large numbers of proteins in biofluids for finding clinically useful biomarkers, the discovery and translation of biomarkers for clinical use has been a greater challenge than many expected. This inability to meet expectations, however, does not mean that significant strides have not been made in bringing proteomic discoveries closer to the clinic. Proteomics laboratories can characterize biofluids with a depth at least an order of magnitude greater than possible only 5–7 years ago, and have expanded our capabilities to include such specimens as frozen and formalin-fixed paraffin-embedded tissues Citation[6].

There have been at least two major stumbling blocks in translating biomarker discoveries to clinical applications. One is the rate at which potentially useful biomarkers are discovered far exceeds the pace of validation studies. The other, which has been a greater stumbling block, is the questionable quality or utility of the potential biomarkers that have been discovered and published. The literature is polluted with reports of biomarkers that are potentially useful for all kinds of diseases. Unfortunately, many of these studies are based on very few samples and/or have suspect specificity for the disease that was being studied. With the burden and cost of validation, it is imperative that biomarkers with the greatest chance of eventually achieving clinical utility be selected. The time has come for developing criteria that rank the probability that a discovered biomarker will pass all the steps required to become a validated, clinically useful biomarker. Criteria should include such parameters as the problem to be addressed (e.g., the need for a diagnostic or prognostic biomarker), number of samples tested in the discovery phase, the quality of the quantitative measurements of the individual proteins, the biomarker’s specificity for a particular disease, the biomarker’s future utility (e.g., diagnostic, prognostic and therapeutic monitoring), and the ease by which the biomarker is monitored (e.g., issue vs biofluid and ELISA vs MS assay). The lack of these standards has impaired the field of proteomics biomarker discovery and has resulted in a lack of confidence in taking potential biomarkers through the various stages of validation. Even with all of these parameters in place, what is critical to the success of any project is careful planning prior to beginning the first experiment. In designing a biomarker discovery project, it is critical that an interdisciplinary group of scientists (including physicians, clinicians, analytical chemists and bioinformaticists) be asked to provide input. The best study design will be attained only through gathering information from the various disciplines ultimately involved in the project.

Biomarkers discovered via MS technology will find their utility within the clinic. The need for biomarkers is too great for this not to happen. It is more probable that biomarkers will enter the clinic as panels rather than a single protein acting as the sole indicator of the presence of a specific condition. This scenario makes perfect sense when considering a protein whose function has been modified by a particular condition will propagate this change to other proteins. The detection of aberrant changes in modifications such as phosphorylation and glycosylation will become increasingly important and will provide biomarkers with greater specificity than simply those that change in abundance. Measuring the natural variability of any potential biomarker before it moves to a validation phase will become increasingly important. Once biomarkers discovered using MS are validated for clinical use, the platform used to routinely measure them will move to something that is more common in clinical laboratories. Antibody-based kits immediately come to mind as tools for quantitating protein levels in biofluids and tissues. MS techniques (such as multiple-reaction monitoring, more commonly known as MRM) have wonderful specificity and quantitative ability; however, routinely using this technique in clinical laboratories will require a significant investment in training and capital equipment.

Financial & competing interests disclosure

This project has been funded in whole or in part with federal funds from the National Cancer Institute, NIH under Contract NO1-CO-12400. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organization imply endorsement by the US Government. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

No writing assistance was utilized in the production of this manuscript.

References

  • Sanger F, Tuppy H. The amino-acid sequence in the phenylalanine chain of insulin. 1. The identification of lower peptides from partial hydrolysates. Biochem. J.49, 481–490 ( 1951).
  • McDonald WH, Yates JR 3rd. Shotgun proteomics and biomarker discovery. Dis. Markers18, 99–105 (2002).
  • Jones J, Pantuck AJ. Genomics and proteomics in renal cell carcinoma: diagnosis, prognosis, and treatment selection. Curr. Urol. Rep.9, 9–14 (2008).
  • Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature422, 198–207 (2003).
  • Liu H, Lin D, Yates JR 3rd. Multidimensional separations for protein/peptide analysis in the post-genomic era. Biotechniques32, 898–902 (2002).
  • Fowler CB, Cunningham RE, Waybright TJ et al. Elevated hydrostatic pressure promotes protein recovery from formalin-fixed, paraffin-embedded tissue surrogates. Lab. Invest.88, 185–195 (2008).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.