131
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A comparison of single imputation and multiple imputation methods for missing data in different oncogene expression profiles

, , , , & ORCID Icon
Pages 113-127 | Received 04 Jun 2018, Accepted 10 Dec 2021, Published online: 07 Feb 2022

References

  • Brettingham-Moore KH, Duong CP, Heriot AG, et al. Using gene expression profiling to predict response and prognosis in gastrointestinal cancers-The promise and the perils. Ann Surg Oncol. 2011;18(5):1484–1491.
  • Lee WP, Tzou WS. Computational methods for discovering gene networks from expression data[J]. Brief Bioinform. 2009;10(4):408–423.
  • Zhou X, Wang X, Dougherty ER. Missing-value estimation using linear and non-linear regression with Bayesian gene selection. Bioinformatics. 2003;19(17):2302–2307.
  • Chen Y, Kamat V, Dougherty ER, et al. Ratio statistics of gene expression levels and applications to microarray data analysis. Bioinformatics. 2002;18(9):1207–1215.
  • Mancuso CA, Canfield JL, Singla D, et al. A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes. Nucleic Acids Res. 2020;48(21):e125–e125.
  • Brock GN, Shaffer JR, Blakesley RE, et al. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes. BMC Bioinformatics. 2008;9(1):12–12.
  • Dorri F, Azmi P, Dorri F. Missing value imputation in DNA microarrays based on conjugate gradient method. Comput Biol Med. 2012;42(2):222–227.
  • Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520–525.
  • Bedrick EJ, Lapidus J, Powell JF. Estimating the Mahalanobis distance from mixed continuous and discrete data. Biometrics. 2000;56(2):394–401.
  • Yu Z, Schaid DJ. Methods to impute missing genotypes for population data. Human Genetics. 2007;122(5):495–504.
  • Scheet P, Stephens M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006;78(4):629–644.
  • Fallin D, Schork NJ. Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. Am J Hum Genet. 2000;67(4):947–959.
  • Little R, Rubin D. Statistical analysis with missing data [M]. New York: John Wiley & Sons Inc; 1987.
  • Chen J, Hunter S, Kisfalvi K, et al. A hybrid approach of handling missing data under different missing data mechanisms: VISIBLE 1 and VARSITY trials for ulcerative colitis. Contemp Clin Trials. 2021;100:106226.
  • Presti R L, Barca E, Passarella G. A methodology for treating missing data applied to daily rainfall data in the candelaro river basin (Italy). Environ Monit Assess. 2010;160(1–4):1–22.
  • Perry MB, Pignatiello JJ. Estimation of the change point of a normal process mean with a linear trend disturbance. Qual Technol Quant Manag. 2006;3(3):101–115.
  • Kline D, Andridge R, Kaizar E. Comparing multiple imputation methods for systematically missing subject-level data. Res Synth Methods. 2015 Dec 17. doi:https://doi.org/10.1002/jrsm.1192.
  • Ryan R, Vernon S, Lawrence G, et al. Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records [M]. BMC Med Inform Decis Mak. 2012;12:3.
  • Barnard J, Rubin DB. Small-sample degrees of freedom with multiple imputation. Biometrika. 1999;86(4):948–955.
  • Rubin DB. Multiple imputation after 18+ years. J Am Stat Assoc. 1996;91(434):473–489.
  • Hople PK, Liu C, et al. Multiple imputation for multivariate data with missing and below-threshold measurements:time-series concentrations of pollutants in the Arctic. Biometrics. 2001;57:22–33.
  • Carpenter J, Kenward M. Multiple imputation and its application. New York (NY): John Wiley & Sons; 2013.
  • Ghorai S, Mukherjee A, Sengupta S, et al. Cancer classification from gene expression data by NPPC ensemble. IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):659–671. PMID: 20479504. doi:https://doi.org/10.1109/TCBB.2010.36.
  • Özcan ŞİmŞek NÖ, ÖzgÜr A, GÜrgen F. A novel gene selection method for gene expression data for the task of cancer type classification. Biol Direct. 2021 Feb 8;16(1):7. PMID: 33557857; PMCID: PMC7869482. doi:https://doi.org/10.1186/s13062-020-00290-3.
  • Oh S, Kang DD, Brock GN, et al. Biological impact of missing-value imputation on downstream analyses of gene expression profiles. Bioinformatics. 2011;27(1):78–86.
  • Liao SG, Lin Y, Kang DD, et al. Missing value imputation in high-dimensional phenomic data: imputable or not, and how? BMC Bioinformatics. 2014;15(1):346–334.
  • Masiero JR, Mainzer AK, Bauer JM, et al. Asteroid family identification using the hierarchical clustering method and WISE/NEOWISE physical properties. Astrophys J. 2013;770(1):2394–2404.
  • Zhong C, Miao D, Fränti P. Minimum spanning tree based split-and-merge: A hierarchical clustering method. Inf Sci (Ny). 2011;181(16):3397–3410.
  • Chavent M, Lechevallier Y, Briant O. DIVCLUS-T: a monothetic divisive hierarchical clustering method. Comput Stat Data Anal. 2007;52(2):687–701.
  • Halkidi M, Batistakis Y, Vazirgiannis M. On clustering validation techniques. J Intell Inf Syst. 2001;17(2):107–145.
  • Alon U, Barkai N, Notterman DA, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA. 1999;96(12):6745–6750.
  • Kuner R, Sueltmann H, Ruschhaupt M, et al. Gene expression differences between adenocarcinoma and squamous cell carcinoma in human NSCLC. Public on Oct 01, 2009. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10245.
  • Fontes JD, Ramsey JE. Expression data from knockdown of ZXDC1/2 in PMA-treated U937. Public on Oct 01, 2013. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE45417.
  • R Core Team. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2016.
  • van den Boogaart KG. (2020). tensorA: Advanced Tensor Arithmetic with NamedIndices. R package version 0.36.2. https://CRAN.R-project.org/package=tensorA.
  • Maechler M, Rousseeuw P, Croux C, et al. (2021). robustbase: Basic Robust Statistics Rpackage version 0.93-9. URLhttp://CRAN.R-project.org/package=robustbase.
  • Rossi P. (2019). bayesm: Bayesian Inference for Marketing/Micro-Econometrics. R package version 3.1-4. https://CRAN.R-project.org/package=bayesm.
  • Rizzo M, Szekely G. (2021). Energy: E-Statistics: Multivariate Inference via the Energy of Data. R package version 1.7-8. https://CRAN.R-project.org/package=energy.
  • van den Boogaart KG, Tolosana-Delgado R, Bren M. (2021). Compositions: Compositional Data Analysis. R package version 2.0-2. https://CRAN.R-project.org/package=compositions.
  • SPSS I. SPSS 16.0 base user’s guide. SPSS Inc. 2007;49(7):741–743.
  • SAS Institute Inc. (2003). SAS/STAT 9 User's guide. [cited 2016 May 05].
  • Liew AW, Law NF, Yan H. Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Brief Bioinform. 2010;12(5):498–513.
  • Kim H, Golub GH, Park H. Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics. 2005;21(2):187–198.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.