5,982
Views
9
CrossRef citations to date
0
Altmetric
Applications and Case Studies

Batch Effects Correction with Unknown Subtypes

&
Pages 581-594 | Received 01 May 2016, Published online: 13 Nov 2018

References

  • Alli, E., Yang, J., and Hait, W. (2007), “Silencing of Stathmin Induces Tumor-Suppressor Function in Breast Cancer Cell Lines Harboring Mutant p53,” Oncogene 26, 1003–1012.
  • Banfield, J. D., and Raftery, A. E. (1993), “Model-Based Gaussian and Non-Gaussian Clustering,” Biometrics, 49, 803–821.
  • Bickel, P. J., and Levina, E. (2004), “Some Theory for Fisher’s Linear Discriminant Function, ’Naive Bayes’, and Some Alternatives When there are Many More Variables than Observations,” Bernoulli, 10, 989–1010.
  • Carey, L. A., Perou, C. M., Livasy, C. A., Dressler, L. G., Cowan, D., Conway, K., Karaca, G., Troester, M. A., Tse, C. K., Edmiston, S., Deming, S. L., Geradts, J., Cheang, M. C. U., Nielsen, T. O., Moorman, P. G., Earp, H. S., and Millikan, R. C. (2006), “Race, Breast Cancer Subtypes, and Survival in the Carolina Breast Cancer Study,” Journal of the American Medical Association, 295, 2492–2502.
  • Casella, G., and Berger, R. L. (2002), “Statistical Inference (Vol. 2), Pacific Grove, CA: Duxbury.
  • Chahrour, M., Jung, S. Y., Shaw, C., Zhou, X., Wong, S. T., Qin, J., and Zoghbi, H. Y. (2008), “Mecp2, A Key Contributor to Neurological Disease, Activates and Represses Transcription,” Science, 320, 1224–1229.
  • Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Series B, 39, 1–38.
  • Desmedt, C., Piette, F., Loi, S., Wang, Y., Lallemand, F., Haibe-Kains, B., Viale, G., Delorenzi, M., Zhang, Y., Saghatchian d'Assignies, M., Bergh, J., Lidereau, R., Ellis, P., Harris, A. L., Klijn, J. G. M., Foekens, J. A., Cardoso, F., Piccart, M. J., Buyse, M., and Sotiriou, C. (2007), “Strong Time Dependence of the 76-gene Prognostic Signature for Node-Negative Breast Cancer Patients in the Transbig Multicenter Independent Validation Series,” Clinical Cancer Research, 13, 3207–3214.
  • Edgar, R., Domrachev, M., and Lash, A. E. (2002), “Gene Expression Omnibus: NCBI Gene Expression and Hybridization Array Data Repository,” Nucleic Acids Research, 30, 207–210.
  • Fraley, C., and Raftery, A. E. (2002), “Model-Based Clustering, Discriminant Analysis, and Density Estimation,” Journal of the American Statistical Association, 97, 611–631.
  • Franks, A. M., Csárdi, G., Drummond, D. A., and Airoldi, E. M. (2015), “Estimating A Structured Covariance Matrix from Multilab Measurements in High-Throughput Biology,” Journal of the American Statistical Association, 110, 27–44.
  • Fujita, N., Jaye, D. L., Kajita, M., Geigerman, C., Moreno, C. S., and Wade, P. A. (2003), “Mta3, a Mi-2/NuRD Complex Subunit, Regulates An Invasive Growth Pathway in Breast Cancer,” Cell, 113, 207–219.
  • Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2014), Bayesian Data Analysis, (Vol. 2), Boca Raton, FL: Chapman & Hall/CRC.
  • Geman, S., and Geman, D. (1984), “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6, 721–741.
  • George, E. I., and McCulloch, R. E. (1993), “Variable Selection via Gibbs Sampling,” Journal of the American Statistical Association, 88, 881–889.
  • Hein, A.-M. K., Richardson, S., Causton, H. C., Ambler, G. K., and Green, P. J. (2005), “Bgx: A Fully Bayesian Integrated Approach to the Analysis of Affymetrix GeneChip Data,” Biostatistics, 6, 349–373.
  • Hicks, S. C., Teng, M., and Irizarry, R. A. (2015), “On the Widespread and Critical Impact of Systematic Bias and Batch Effects in Single-Cell RNA-seq Data,” BioRxiv, 025528.
  • Hubert, L., and Arabie, P. (1985), “Comparing Partitions,” Journal of Classification, 2, 193–218.
  • Huo, Z., Ding, Y., Liu, S., Oesterreich, S., and Tseng, G. (2016), “Meta-analytic Framework for Sparse k-Means to Identify Disease Subtypes in Multiple Transcriptomic Studies,” Journal of the American Statistical Association, 111, 27–42.
  • Irizarry, R. A., Warren, D., Spencer, F., Kim, I. F., Biswal, S., Frank, B. C., Gabrielson, E., Garcia, J. G. N., Geoghegan, J., Germino, G., Griffin, C., Hilmer, S. C., Hoffman, E., Jedlicka, A. E., Kawasaki, E., Martínez-Murillo, F., Morsberger, L., Lee, H., Petersen, D., Quackenbush, J., Scott, A., Wilson, M., Yang, Y., Ye, S. Q., and Yu, W. (2005), “Multiple-Laboratory Comparison of Microarray Platforms,” Nature Methods, 2, 345–350.
  • Jacob, L., Gagnon-Bartsch, J. A., and Speed, T. P. (2016), “Correcting Gene Expression Data When Neither the Unwanted Variation Nor the Factor of Interest are Observed,” Biostatistics, 17, 16–28.
  • Ji, Y., Wu, C., Liu, P., Wang, J., and Coombes, K. R. (2005), “Applications of Beta-Mixture Models in Bioinformatics,” Bioinformatics, 21, 2118–2122.
  • Johnson, W. E., Li, C., and Rabinovic, A. (2007), “Adjusting Batch Effects in Microarray Expression Data Using Empirical Bayes Methods,” Biostatistics, 8, 118–127.
  • Karlis, D., and Meligkotsidou, L. (2007), “Finite Mixtures of Multivariate Poisson Distributions with Application,” Journal of Statistical Planning and Inference, 137, 1942–1960.
  • Leek, J. T. (2014), “svaseq: Removing Batch Effects and Other Unwanted Noise from Sequencing Data,” Nucleic Acids Research, gku864, e161.
  • Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., et al. (2010), “Tackling the Widespread and Critical Impact of Batch Effects in High-Throughput Data,” Nature Reviews Genetics, 11, 733–739.
  • Leek, J. T., and Storey, J. D. (2007), “Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis,” PLoS Genetics,3, e161.
  • Maitra, R., and Ramler, I. P. (2009), “Clustering in the Presence of Scatter,” Biometrics, 65, 341–352.
  • McCall, M. N., Bolstad, B. M., and Irizarry, R. A. (2010), “Frozen Robust Multiarray Analysis (FRMA),“ Biostatistics, 11, 242–253.
  • McLachlan, G., and Peel, D. (2004), Finite Mixture Models, New York: Wiley.
  • Newton, M. A., Noueiry, A., Sarkar, D., and Ahlquist, P. (2004), “Detecting Differential Gene Expression with a Semiparametric Hierarchical Mixture Method,” Biostatistics, 5, 155–176.
  • Onitilo, A. A., Engel, J. M., Greenlee, R. T., and Mukesh, B. N. (2009), “Breast Cancer Subtypes based on ER/PR and Her2 Expression: Comparison of Clinicopathologic Features and Survival,” Clinical Medicine & Research, 7, 4–13.
  • Pan, W., and Shen, X. (2007), “Penalized Model-Based Clustering with Application to Variable Selection,” The Journal of Machine Learning Research, 8, 1145–1164.
  • Peterson, C., Stingo, F. C., and Vannucci, M. (2015), “Bayesian Inference of Multiple Gaussian Graphical Models,” Journal of the American Statistical Association, 110, 159–174.
  • Piccart-Gebhart, M. J., Procter, M., Leyland-Jones, B., Goldhirsch, A., Untch, M., et al. (2005), “Trastuzumab after Adjuvant Chemotherapy in Her2-Positive Breast Cancer,” New England Journal of Medicine, 353, 1659–1672.
  • Piccolo, S. R., Sun, Y., Campbell, J. D., Lenburg, M. E., Bild, A. H., and Johnson, W. E. (2012), “A Single-Sample Microarray Normalization Method to Facilitate Personalized-Medicine Workflows,” Genomics, 100, 337–344.
  • Pickrell, J. K., Marioni, J. C., Pai, A. A., Degner, J. F., Engelhardt, B. E., Nkadori, E., Veyrieras, J.-B., Stephens, M., Gilad, Y., and Pritchard, J. K. (2010), “Understanding Mechanisms Underlying Human Gene Expression Variation with RNA Sequencing,” Nature, 464, 768–772.
  • Ritter, G. (2014), Robust Cluster Analysis and Variable Selection, Boca Raton, FL: CRC Press.
  • Robert, C., and Casella, G. (2013), Monte Carlo Statistical Methods, New York: Springer Science & Business Media.
  • Schwarz, G. (1978), “Estimating the Dimension of a Model,” The Annals of Statistics, 6, 461–464.
  • Slamon, D. J., Clark, G. M., Wong, S. G., Levin, W. J., Ullrich, A., and McGuire, W. L. (1987), “Human Breast Cancer: Correlation of Relapse and Survival with Amplification of the Her-2/Neu Oncogene,” Science, 235, 177–182.
  • Suárez-Fariñas, M., Shah, K. R., Haider, A. S., Krueger, J. G., and Lowes, M. A. (2010), “Personalized Medicine in Psoriasis: Developing a Genomic Classifier to Predict Histological Response to Alefacept,” BMC Dermatology, 10, 1–8.
  • Taub, M. A., Corrada Bravo, H., and Irizarry, R. A. (2010), “Overcoming Bias and Systematic Errors in Next Generation Sequencing Data,” Genome Medicine, 2, 87.
  • The Cancer Genome Atlas Network (2012), “Comprehensive Molecular Portraits of Human Breast Tumours,” Nature, 490, 61–70.
  • Tibshirani, R. (1996), “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society, Series B, 58, 267–288.
  • Tseng, G. C., and Wong, W. H. (2005), “Tight Clustering: A Resampling-Based Approach for Identifying Stable and Tight Patterns in Data,” Biometrics, 61, 10–16.
  • Wang, S., and Zhu, J. (2008), “Variable Selection for Model-Based High-Dimensional Clustering and its Application to Microarray Data,” Biometrics, 64, 440–448.
  • Wang, Y., Klijn, J. G., Zhang, Y., Sieuwerts, A. M., Look, M. P., Yang, F., Talantov, D., Timmermans, M., Meijer-van Gelder, M. E., Yu, J., Jatkoe, T., Berns, Els M. J. J., Atkins, D., and Foekens, J. A. (2005), “Gene-Expression Profiles to Predict Distant Metastasis of Lymph-Node-Negative Primary Breast Cancer,” The Lancet, 365, 671–679.
  • Witten, D. M., and Tibshirani, R. (2012), “A Framework for Feature Selection in Clustering,” Journal of the American Statistical Association, 105, 713–726.
  • Wolf, I., Levanon-Cohen, S., Bose, S., Ligumsky, H., Sredni, B., Kanety, H., Kuro-o, M., Karlan, B., Kaufman, B., Koeffler, H. P., and Rubinek, T. (2008), “Klotho: A Tumor Suppressor and A Modulator of the igf-1 and fgf Pathways in Human Breast Cancer,” Oncogene, 27, 7094–7105.
  • Xia, W., Chen, J.-S., Zhou, X., Sun, P.-R., Lee, D.-F., Liao, Y., Zhou, B. P., and Hung, M.-C. (2004), “Phosphorylation/Cytoplasmic Localization of p21cip1/waf1 is Associated with Her2/neu Overexpression and Provides a Novel Combination Predictor for Poor Prognosis in Breast Cancer Patients,” Clinical Cancer Research, 10, 3815–3824.
  • Yakowitz, S. J., and Spragins, J. D. (1968), “On the Identifiability of Finite Mixtures,” The Annals of Mathematical Statistics, 39, 209–214.
  • Yeung, K. Y., Fraley, C., Murua, A., Raftery, A. E., and Ruzzo, W. L. (2001), “Model-Based Clustering and Data Transformations for Gene Expression Data,” Bioinformatics, 17, 977–987.