1,118
Views
13
CrossRef citations to date
0
Altmetric
Research Paper

Reporting of preclinical tumor-graft cancer therapeutic studies

, &
Pages 1262-1268 | Received 07 Mar 2012, Accepted 08 Aug 2012, Published online: 16 Aug 2012

Abstract

Purpose: Characterize the parameters of reporting tumor-graft experiments for oncologic drug development.

Experimental Design: Using Institute of Scientific Information impact factors, we identified the most-cited medical and oncology journals with tumor-graft experiments in murine models. For each article, the characteristics of the experimental design, outcome measurements, and statistical analysis were examined.

Results: We examined 145 articles describing tumor-graft experiments from October through December 2008. The articles spanned a range of disease types, animal models, treatments and delivery methods. One hundred (69%) articles were missing information needed to replicate the experiments. Outcome measurements included: tumor size (83%), biological changes (57%), and survival or cure-rate outcomes (28%). Thirty-three percent did not specify how tumor size was measured and 30% were missing the formula for evaluating volume. Only 14% utilized appropriate statistical methods. Ninety-one percent of studies were reported as positive and 7% reported with mixed positive-negative results; only 2% of studies were reported negative or inconclusive. Twenty-two articles from 2012 showed improvement in the utilization of statistical methods (35% optimal, p = 0.05) but had a similar fraction with experimental design issues (82%; p = 0.32) limiting reproducibility and 91% had positive results.

Conclusions: Tumor-graft studies are reported without a set standard, often without the methodological information necessary to reproduce the experiments. The high percentage of positive trials suggests possible publication bias. Considering the widespread use of such experiments for oncologic drug development, scientists and publishers should develop experimental and publication guidelines for such experiments to ensure continued improvements in reporting.

Introduction

The utility of animal experiments for predicting clinical efficacy in humans has been controversial.Citation1-Citation3 Reasons for failure include, but are not limited to, design flaws in the studies themselves, inadequate or biased data collection, and inherent differences between the animal and human models of illness.Citation2,Citation4

In cancer research, the development of new agents is predominantly based upon preclinical studies of drug activity, typically assessed with in vitro cell line and in vivo murine model systems. However, only 4% of agents in classic phase I studies of new single agents show a response, a dismal clinical success rate.Citation5 An analysis by the National Cancer Institute reported that tumor-graft experimental results, when compared with eventual phase II results of an agent, had only a modest association of activity within the same tumor histology.Citation6 Similarly, a literature review comparing drug activity in Phase II trials and in vitro and in vivo preclinical activity showed mixed results in different tumor types and a lack of any predictive ability of murine allograft models.Citation7

The lack of good model systems has resulted in researchers in drug development using human tumor-graft experiments in mice as the best option. These studies have a customary experimental pattern that is followed; first tumor cells are implanted into a mouse, then the animal is treated with an agent of interest, and finally a measurement of anti-tumor efficacy is obtained.

To date there has been little scrutiny of the methodology and reporting of such studies. An examination of the methods of such experiments prior to moving drugs forward clinically may be important to help us understand the poor track record of predicting the clinical efficacy of treatments in humans. We conducted a literature review of published tumor-graft experiments performed for therapeutic drug testing in high impact medical and oncology journals that most commonly publish in vivo tumor-graft murine data. Our goal was to summarize the methodology used and reporting of tumor-grafting studies and to assess potential issues and sources of bias, which may contribute to poor translation utility. In addition, we sought to determine whether or not improvements have been made in the era of increased awareness of the need for reproducible research.

Results

Ten journals were reviewed to identify articles reporting murine tumor-graft experiments (); two of which did not contain any articles meeting our inclusions criteria (Cell and Science). We identified 145 articles reporting murine tumor-graft experimentation data from the 8 remaining journals published between October 1, 2008 and December 31, 2008 (Supplemental Material). Research was conducted in multiple countries: 82 articles from North America (United States and Canada), 38 from Europe (Austria, Czech Republic, Finland, Germany, Italy, Netherlands, Spain, Sweden, Switzerland, and the United Kingdom), 23 from Asia (China, Japan, Korea, Singapore, Taiwan), and 1 each from Australia and India. Most articles (n = 108, 75%) mentioned that the experiments had undergone some form of review or met animal treatment guidelines. The number of experiments ranged from 1 to 7 experiments per article (median 1); 120 (83%) articles included a single experiment. A majority of articles reported that experiments were performed in a single cell line (75 articles, 52%).

Table 1. Journals reviewed

Experimental design

As expected, there was a wide range of cancer types (n = 14), animal models (n = 7), and methods of treatment delivery (n = 10). Drug therapies were classified as cytotoxic, antibody, immunologic modulation, targeted/small molecule, vector based, and/or other. The presence of radiation therapy in combination with drug treatment was also noted in 7 articles. The majority of articles studied a single drug type (65%). There were no consistent criteria for the initiation of experimental treatment with some articles emphasizing time since tumor inoculation (n = 80, 62%) and others tumor size (n = 41, 32%) or containing separate experiments emphasizing each (n = 8, 6%).

There was a large amount of missing data with regard to the experimental design (). In all, 50% of the articles were missing at least one component used to describe the type of murine model, including sex (29%), age (36%), animal model (2%) and number of the animals per treatment arm (12%). An additional 23 (16%) articles specified a range or minimum number of mice per arm. Information on tumor inoculation was missing in 6% of articles (1% location and 6% number of implanted cells). Information on important treatment parameters was missing for 14% of the experiments including route of drug administration (6%), timing of treatment relative to tumor inoculation (6%), and the criteria used for treatment initiation (11%).

Table 2. Missing data in the experimental design and end-point measurements

Additional details of design elements were also lacking. Only one article included a justification for the sample size. Most of the articles had relatively small numbers of mice per arm (< 10 for all [n = 98] or some experiments within the article [n = 19]: n = 107/128, 91%). Although all articles examined multiple treatment arms, the details as to how mice were assigned to treatments were only provided in 47 (33%) of articles: 43 used randomization and 4 used size matched cohorts. A total of 32 (22%) articles reported that multiple replicates of experiments were performed with the results roughly equally presented as pooled results (n = 11, 34%), representative results (n = 13, 41%), and 2 (6%) each with both or replicate results in the supplement. However, 4 (13%) did not specify which replicate(s) were presented. Overall, 69% of the experiments were missing some data that made the experiment difficult or impossible to reproduce without obtaining further details, the reference standard for our analysis.

End-point measurements

End-point measurements varied both in the type and number of outcomes (). Eighteen percent (n = 26) of the articles had a single outcome measurement; the remainder had multiple outcome measurements. The most commonly observed outcomes included: tumor size (83%), biologic or pharmacodynamic changes (56%), and survival or cure rate outcomes (28%).

Table 3. End-point measurements and analyses used in the reporting of results in tumor-graft studies

In addition to noting whether or not tumor size was used as an outcome measurement, we also recorded the formula, method of assessment (calipers, bioluminescence, MRI, etc), and the tumor inoculation location of the specific articles. Of the 120 articles reporting tumor size as an outcome, 53 (44%) were missing some information and 19 (16%) did not provide any information about formula or assessment (). Information was generally contained in the methods section (n = 85, 84%) although some articles provided details in the main text or figures (n = 5), a reference article (n = 7), or supplemental materials (n = 4). A total of 40 (33%) of articles with tumor size as an outcome measurement did not provide the method of tumor measurement. The majority (68, 85%) used calipers to measure the tumor. The formula for the calculation of size was provided 74 articles (62%), was not expected due to imaging for 14 articles (12%) but was missing for the remaining 32 (26%) articles. A total of 15 different formulae were used to calculate tumor size in the 74 articles that reported a formula (), although 3 of the formulae were presented in multiple formats. The most common were (lxwCitation2)/2 (n = 31, 42%) and (lxw2)*π/2 (n = 17, 23%).

Table 4. Formula representations of tumor size

Analysis and reporting

The reported articles used a combination of figures, summary statistics, and formal statistical tests to summarize the findings of the experiments (). Graphical data included survival curves, bar-graphs and trends over time. The majority of articles (n = 115, 79%) described a statistical test either in the methods section (n = 98, 85%), the text (10%), or a combination of the two (5%). P-values derived from these tests were presented either in the text (n = 102, 89%) or in the text and abstract (n = 10, 9%) with the exception of 3 (2%) articles that did not appear to implement the tests outlined in the paper. Graphical and/or informal comparisons, without a formal statistical test, were the basis for motivating conclusions in 12 (8%) articles, although three of these articles had described an analysis plan with formal statistical testing. The remainder of the articles (n = 18, 12%) either reported p-values without specifying the text (n = 16) or declared ‘statistical significance’ or ‘significance’ without providing a specific statistical test or p-value (n = 2). A wide variety of tests were used in the 112 articles that described and implemented formal statistical tests including t-tests (58%), ANOVA (28%), log-rank (21%), and Wilcoxon tests (19%). Only 4 (4%) articles utilized repeated measurements (n = 2) or generalized estimating equations (n = 2).

The articles were also evaluated to determine whether or not an appropriate statistical test was selected. Overall, 16 (14%) of the 112 describing and implementing statistical tests were judged to be appropriate. Failure to use repeated measurements to adjust for the evaluation of mice at multiple time points or multiple tumors was the most common error (n = 83/96, 86%). The second leading issue was failure to use proper survival analysis techniques for censored data (n = 14/96, 15%).

Of the 145 articles’ results, the overwhelming majority had a positive result (n = 132, 91%) and an additional 10 (7%) experiments were identified as having mixed positive and negative results. Only 2 experiments (1%) were categorized as negative and one experiment (1%) was specifically described as inconclusive.

Comparison cohort

In order to determine whether the growing awareness of the importance of reproducible research and appropriate statistical analysis has had an impact on reporting since 2008, we reviewed articles published in between April 1, 2012 and April 30, 2012 in the two journals with the greatest number of articles in the original review: Cancer Research (CR) and Clinical Cancer Research (CCR) (Supplemental Material). A total of 22 articles met the inclusion criteria (CR: n = 17 and CCR: n = 5).

The focus of the experiments remained on tumor size (n = 21, 95%), biologic or pharmacodynamic changes (n = 19, 86%), and survival or cure rate outcomes (n = 5, 23%). Overall 18 (82%) articles were missing some component of experimental design needed for reproducibility (our reference standard for this analysis). Details missing included the murine model (n = 14 [64%]), tumor inoculation (n = 4 [18%]), treatment (n = 7 [32%]) and endpoint measurements (n = 8 [38%]). The overall rate was similar to that observed in 2008 (82% vs 69%, p = 0.32). The number of types of volume quantification remained high with 5 different formulae in the 10 articles providing such information, a rate that was borderline significantly higher than originally observed (5 formulae in 10 articles vs 15 formulae in 74 articles, p = 0.08). None of the articles included a power calculation.

Statistical tests were used as the basis of conclusions for 20 (91%) articles. The most commonly used tests and summary statistics were t-tests (n = 13, 65%), ANOVA (n = 12, 60%), Wilcoxon tests (n = 5, 25%), and Kaplan-Meier curves (n = 4, 20%). One article used a generalized estimating approach. An appropriate analysis was applied for 7 out of 20 articles a significant increase form that observed in 2008 (35% vs 14%, p = 0.05). Failure to utilize repeated measurement modeling (n = 10/15, 67%) or appropriate survival tests (4/15, n = 27%) remained the most common issues.

As noted in 2008, almost all of the articles were positive (n = 20, 91%) with the remaining two articles presenting mixed results.

Discussion

Tumor-grafting experiments are commonly employed tools of drug development, though the ability to translate the results seen in vivo into positive results in clinical trials has been mixed.Citation1-Citation3,Citation6,Citation7 A survey of editorial instructions to authors in cancer research journals reveals no published guidelines required to be met in order to report tumor-graft experiments. Kilkenny et al. reported that, in studies of mice, rats and primates, there were large inconstancies in both type and reporting of experimental designs, data collection techniques and analyses techniques as well as an extremely high percentage of positive results.Citation9 Our review of human tumor-graft models in mice for oncology drug development revealed similar issues with these experiments.

Variability in study design was an expected finding, and in itself is not problematic. Many different murine models, cell lines, types of therapy (e.g., targeted, cytotoxic, immunogenic), and periods of tumor cell incubation are commonly used and may well be appropriate for answering different experimental questions. The quandary arises, however, when salient points of the experimental design are not reported so that the results are not reproducible. In order to reproduce an experiment, it is important to include all relevant details.Citation8 Overall, 100 (69%) of the experiments were missing some data on experimental design or outcome measurements that would make the experiment difficult or impossible to exactly reproduce without obtaining further details. Some of this information may have been left out of the main article due to space/editorial constraints, but full documentation could be included as a supplementary appendix or article posting. In fact, only 35 (24%) of the articles surveyed provided all of the details of experimental design in the Methods section. The remainder either included details in the text, figures or supplemental materials (n = 89, 61%) and/or in previously published papers (n = 21, 15%). Such sources, especially with the expanding role of online supplemental materials, could and should be utilized to ensure that all relevant information is provided.

In addition, some details of the experimental deign would be well served to be standardized across investigators. For a certain cell line in a particular murine model, a standard number of injected cells and time to experimental treatment initiation would allow researchers to better compare the effect of treatments across experiments. Also, there are several issues in allowing an arbitrary timing for treatment initiation. Several different sizes of tumors could be treated making experiment results difficult to compare. This is especially problematic since different sizes of tumors may be more or less responsive to experimental treatment. Furthermore, the arbitrarily select starting point might not mirror clinical situations in which treatment is often dictated by disease being large enough to be radiologically detectable. At present, these parameters are usually chosen based on the experience of the investigator or institutional norms; creating field-wide norms would allow for more consistency. This is especially important in today’s oncologic drug development environment, where multiple agents targeting the same pathways are being developed concurrently by multiple pharmaceutical companies, and sheer volume of agents and proprietary logistics can sometimes impede the same investigators testing all compounds of a class. Finally, it may also be important to match the timing of therapy as closely as possible to how it will be given to humans in order to increase the likelihood of results being comparable between the two species. Such lack of comparability was highlighted in several reviews.Citation2,Citation4,Citation9

Equally important is the standardization of measurements. It is important to evaluate the outcome measurements being used for the potential to cause bias either due to issues with data collection or due to measurement. In the articles we reviewed, outcome measurements varied greatly across different studies, with the majority looking at tumor size (83%) but others looking at parameters such as mouse survival and toxicity. One of the methods for tracking tumor development was doubling time, which may be biased since it depends upon the ability to obtain baseline measurements. If an arbitrary time for treatment initiation is selected, this value may be missing for some of the experimental cohort. For those experiments monitoring doubling time or other methods of tracking tumor size, we saw no consistency for determining tumor size, with 15 methods for calculating size described (although 2 of the formulae were presented differently but were mathematically identical). In some cases an experiment that may not be compelling in three dimensions may look more compelling in two dimensions. Without a fixed standard, the temptation is often to select the “best,” translated to most significant, finding. Over the past 30 y, the oncologic clinical trials community has recognized how challenging it is to evaluate drugs for further development without a standard measurement tool, and has adopted common criteria for tumor response. The first to be used was the World Health Organization (WHO) criteria in 1979 and now the Response Evaluation Criteria in Solid Tumours (RECIST) criteria has become the standard since 2000.Citation10-Citation12 As in clinical trials, the outcome measures for tumor-graft experiments should be defined prospectively and would benefit if they were standardized across all tumor-grafting studies to remove any possible partiality and increase generalizability. In in vivo experiments, the method of measurement is equally important. It is well recognized that caliper measurements of tumor size are inaccurate and have high levels of variability.Citation8 Technologies that more accurate and reproducible, such as microCT or PET scans may be preferable.Citation13

As mentioned above, care should be taken in selecting the outcome measurement and the statistical methods used to analyze those measurements. Proper consideration of the potential for missing data, especially informatively missing data, is a critical component of all analyses.Citation14 Most of the techniques found in the literature for comparing subgroups focused upon comparing the two groups at a single time point (e.g., t-test [58%] or Wilcoxon tests [19%]) despite the presence of replicate measurements (n = 83). However, this form of analysis ignores the longitudinal nature of the study and fails to incorporate all available data. Although 2-way ANOVA was applied in some cases, additional statistical techniques that take advantage of all available data such as generalized estimating equations, mixed effects models or Bayesian hierarchical change-point methods should be considered,Citation15 but were only applied in 4 articles. Although no currently available method is able to address all of the challenges inherent in animal models,Citation16 the more complex models are currently underutilized in favor of more simplistic methods despite their superiority. Furthermore, clear reporting of the planned and implemented techniques should be required. Otherwise, multiple analyses may be performed until one with a positive outcome is found. Graphical techniques may be acceptable, but failure to report the method(s) for evaluating the data, as occurred in 21% of the articles, is not reasonable.

One of the most striking results of our analyses was the proportion of trials reporting positive results (91%), which could be the result of reporting bias or data dredging. The impact of such biases is well established. For example, Sena et al. noted that as many as 1 in 7 animal stroke studies remain unpublished and estimated that these omissions could account for an inflation in estimates of efficacy of up to a third in systematic reviews.Citation17 Such reporting bias could have a number of sources. The most obvious are the investigators, who are not generally rewarded through publication or other methods for writing up and reporting failed experiments of this nature, and another are the journals themselves, which influence the nature of the material presented through their paper acceptance practices. In addition, drug companies may be another source of such bias since they have no stake in the reporting of negative findings.Citation18 Since there are many barriers to the publication of negative studies, there is an increased pressure to discover some aspect of the data that can be reported in a positive light regardless of the original data analysis plan. Requiring standardized and rigorous reporting along with an editorial emphasis on accepting well-designed experiments that yield negative results could help temper both of these problems. If unsuccessful experiments are not reported then we are doomed to repeat them–wasting resources, investigator time, and animal lives.

There has been recent movement in the field of animal research to address some of these issues. Several authors have outlined guidelines for improving reporting in general,Citation19 and for rodent studies in particular.Citation8 As part of this effort, the EQUATOR network has been active in the development of reporting guidelines (http://www.equator-network.org/resource-centre/library-of-health-research-reporting/reporting-guidelines/). The CAMARADES (Collaborative approach to meta-analysis and review of animal data from experimental studies) collaboration has been initiated to develop guidelines, methodology, and a database for supporting systemic reviews and meta-analyses (http://www.camarades.info/). Journals such as PLoS Biology have publically discussed their role and responsibilities in promoting both the ethical treatment of animals as well as sound science.Citation20 Although limited in size, our review of the 2012 articles from CR and CCR showed a significant improvement in the use of statistical techniques compared with 2008 (35% vs 14%, p = 0.05), although the fraction using appropriate methods remained low. However, issues remain in the complete description of the experimental design and outcome measurements needed to properly reproduce an experiment with 82% missing some component. The lessons learned from these researchers should be further applied to tumor-graft studies in cancer research.

In conclusion, our study is the first detailed review of the design and analysis of published tumor-grafting experiments in oncologic drug testing. It reveals the regular lack of detail in experimental methods and outcome measurements, resulting in the majority of experiments being irreproducible based on provided information. We also found strong evidence of the potential for reporting bias favoring positive results. Consistency in experimental design and methods is a cardinal element for any good science, and this is all the more true in tumor-grafting experiments, which are now used ubiquitously in oncologic drug development. While a significant improvement in the proper utilization of statistical tests was observed, additional improvements are necessary. Our results speak to the need for mandatory guidelines for requirements for publication by editorial boards and scientific leaders in our field, similar to those being developed by groups such as EQUATOR but targeted to address the issues unique to tumor-grap studies, to ensure continued improvement in reporting. At a minimum these should require that all of the methodology needed for independent duplication of an experiment as well as accepted standardized outcome measurements and appropriate statistical analysis plans be included.

Materials and Methods

Article selection

Top-ranking journals in the medical field (Cell, Journal of Biological Chemistry, Nature, Nature Medicine and Science) and oncology (Cancer Cell, Cancer Research, Clinical Cancer Research, Journal of the National Cancer Institute and Oncogene) as determined by their corresponding Institute of Scientific Information (ISI) impact factor, were reviewed for articles with tumor-graft experiments in murine models for a three month interval in 2008. Only those articles reporting tumor-graft experiments in murine models with therapeutic drug treatment were included in this review (). Studies with transductions or transfections of cells prior to tumor inoculation without any additional therapeutic drug treatment were excluded. An additional month of articles from 2012 in Cancer Research and Clinical Cancer Research, the two journals with the largest number of articles in the initial review, was examined in order to evaluate changes over time.

Figure 1. Consort diagram of article inclusion/exclusion parameters

Figure 1. Consort diagram of article inclusion/exclusion parameters

Data retrieval

Publications with tumor graft and drug therapeutic experiments were identified and data were recorded according to journal publication, experimental design (murine model, tumor inoculation, and therapeutic intervention/drug), end-point measurement, statistical analysis, and result presentation. We summarized the findings by article. For each article, journal, publication date (issue), the number of experiments, and the number of cell lines tested were recorded.

For murine model, type, sex, age at tumor inoculation, and number of mice per arm were noted. For tumor inoculation, cells were classified according to species of origin, tissue of origin, and history of vector manipulation (i.e., adding a luciferase reporter or a tumor suppressor gene to the parental genome). The number of cells and the location of injection per tumor inoculation were also noted. Drug therapeutics and route of drug delivery were recorded. Reporting of the impetus for start of drug treatment (i.e., elapsed time post tumor inoculation, measurable tumor growth, etc.) was noted. Different methods of deriving tumor growth (i.e., tumor size, diameter, weight) were noted. Endpoints of tumor-graft data and their associated time in relation to treatments’ end were noted. Our reference standard for this analysis was if an article included all information necessary for a reader to be able to replicate the experiments. An experiment was considered to be reproducible if all of the above components were included in the article.

Design elements including power calculations, randomization and experiment replication were evaluated. Statistical analysis techniques were noted and the statistical methodology was reviewed to determine whether or not appropriate techniques were utilized. For outcomes with repeated measurements over time (e.g., tumor volume growth), techniques that take into account the repeated measurements (e.g., two-way ANOVA, generalized estimating equations, mixed effects models or area under the curve analyses) are appropriate as compared with those that do not (e.g., repeated t-tests). For time-to-event outcomes (e.g., survival), specific tests to account for censored data (e.g., log-rank tests or Cox proportional hazards models) should be used. Additional issues that were examined included, but were not limited to, the inclusion of a test for interaction if the significance of effect for two groups was compared, appropriate modeling of dose response, description of a statistical test corresponding to all p-values, and the inclusion of a statistical test and a p-value if the results were declared significant.

The reporting of a positive or negative outcome were also recorded. Results were classified as positive, negative, mixed, or unavailable. A result was deemed positive when there was a statistically significant test (e.g., p < 0.05), a discernible difference between treated and control groups was depicted in a figure, and/or through the author’s use of positive language. A result was deemed mixed when there were identifiable parts of the article that would claim both positive and negative results. The statistical methodology was reviewed to determine whether or not appropriate techniques were utilized.

Statistical analysis

Data were extracted from each publication’s text, figures, supplementary text and figures, and/or any references. Any missing items were noted. Full descriptions were included in the records. Counts and proportions were used to document the distribution of categorical variables as well as the amount of missing data. Comparisons of the characteristics from articles in 2008 and 2012 were made using Fisher’s exact test and Poisson regression for categorical variables and rates, respectively.

Supplemental material

Additional material

Download Zip (175.3 KB)

Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

Acknowledgments

We would like to thank Dr. Scott Kern for his counsel and advice throughout this project, Dr. Elizabeth Jaffee for her editorial comments on the manuscript, and Miss Rachel Rorke for her help as a research assistant.

Financial Support: Elizabeth Sugar, Viraugh Foundation Grant.

Supplemental Material

Supplemental Material may be found here:

http://www.landesbioscience.com/journals/cbt/article/21782/

References

  • Hackam DG, Redelmeier DA. Translation of research evidence from animals to humans. JAMA 2006; 296:1731 - 2; http://dx.doi.org/10.1001/jama.296.14.1731; PMID: 17032985
  • Pound P, Ebrahim S, Sandercock P, Bracken MB, Roberts I, Reviewing Animal Trials Systematically (RATS) Group. Where is the evidence that animal research benefits humans?. BMJ 2004; 328:514 - 7; http://dx.doi.org/10.1136/bmj.328.7438.514; PMID: 14988196
  • Matthews RAJ. Medical progress depends on animal models - doesn’t it?. J R Soc Med 2008; 101:95 - 8; http://dx.doi.org/10.1258/jrsm.2007.070164; PMID: 18299631
  • van der Worp HB, Howells DW, Sena ES, Porritt MJ, Rewell S, O’Collins V, et al. Can animal models of disease reliably inform human studies?. PLoS Med 2010; 7:e1000245; http://dx.doi.org/10.1371/journal.pmed.1000245; PMID: 20361020
  • Horstmann E, McCabe MS, Grochow L, Yamamoto S, Rubinstein L, Budd T, et al. Risks and benefits of phase 1 oncology trials, 1991 through 2002. N Engl J Med 2005; 352:895 - 904; http://dx.doi.org/10.1056/NEJMsa042220; PMID: 15745980
  • Johnson JI, Decker S, Zaharevitz D, Rubinstein LV, Venditti JM, Schepartz S, et al. Relationships between drug activity in NCI preclinical in vitro and in vivo models and early clinical trials. Br J Cancer 2001; 84:1424 - 31; http://dx.doi.org/10.1054/bjoc.2001.1796; PMID: 11355958
  • Voskoglou-Nomikos T, Pater JL, Seymour L. Clinical predictive value of the in vitro cell line, human xenograft, and mouse allograft preclinical cancer models. Clin Cancer Res 2003; 9:4227 - 39; PMID: 14519650
  • Hollingshead MG. Antitumor efficacy testing in rodents. J Natl Cancer Inst 2008; 100:1500 - 10; http://dx.doi.org/10.1093/jnci/djn351; PMID: 18957675
  • Kilkenny C, Parsons N, Kadyszewski E, Festing MFW, Cuthill IC, Fry D, et al. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS One 2009; 4:e7824; http://dx.doi.org/10.1371/journal.pone.0007824; PMID: 19956596
  • Miller AB, Hoogstraten B, Staquet M, Winkler A. Reporting results of cancer treatment. Cancer 1981; 47:207 - 14; http://dx.doi.org/10.1002/1097-0142(19810101)47:1<207::AID-CNCR2820470134>3.0.CO;2-6; PMID: 7459811
  • Therasse P, Arbuck SG, Eisenhauer EA, Wanders J, Kaplan RS, Rubinstein L, et al. New guidelines to evaluate the response to treatment in solid tumors (RECIST Guidelines). J Natl Cancer Inst 2000; 92:205 - 16; http://dx.doi.org/10.1093/jnci/92.3.205; PMID: 10655437
  • Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer 2009; 45:228 - 47; http://dx.doi.org/10.1016/j.ejca.2008.10.026; PMID: 19097774
  • Jensen MM, Jørgensen JT, Binderup T, Kjaer A. Tumor volume in subcutaneous mouse xenografts measured by microCT is more accurate and reproducible than determined by 18F-FDG-microPET or external caliper. BMC Med Imaging 2008; 8:16; http://dx.doi.org/10.1186/1471-2342-8-16; PMID: 18925932
  • Heitjan DF, Manni A, Santen RJ. Statistical analysis of in vivo tumor growth experiments. Cancer Res 1993; 53:6042 - 50; PMID: 8261420
  • Zhao L, Morgan MA, Parsels LA, Maybaum J, Lawrence TS, Normolle D. Bayesian hierarchical changepoint methods in modeling the tumor growth profiles in xenograft experiments. Clin Cancer Res 2011; 17:1057 - 64; http://dx.doi.org/10.1158/1078-0432.CCR-10-1935; PMID: 21131555
  • Heitjan DF. Biology, models, and the analysis of tumor xenograft experiments. Clin Cancer Res 2011; 17:949 - 51; http://dx.doi.org/10.1158/1078-0432.CCR-10-3279; PMID: 21224374
  • Sena ES, van der Worp HB, Bath PMW, Howells DW, Macleod MR. Publication bias in reports of animal stroke studies leads to major overstatement of efficacy. PLoS Biol 2010; 8:e1000344; http://dx.doi.org/10.1371/journal.pbio.1000344; PMID: 20361022
  • Bourgeois FT, Murthy S, Mandl KD. Outcome reporting among drug trials registered in ClinicalTrials.gov. Ann Intern Med 2010; 153:158 - 66; PMID: 20679560
  • Kilkenny C, Browne WJ, Cuthill IC, Emerson M, Altman DG. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biol 2010; 8:e1000412; http://dx.doi.org/10.1371/journal.pbio.1000412; PMID: 20613859
  • MacCallum CJ. Reporting animal studies: good science and a duty of care. PLoS Biol 2010; 8:e1000413; http://dx.doi.org/10.1371/journal.pbio.1000413; PMID: 20613860

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.