2,619
Views
1
CrossRef citations to date
0
Altmetric
Short Communication

Detecting differentially expressed genes of heterogeneous and positively skewed data using half Johnson’s modified t-test

ORCID Icon, , & | (Reviewing Editor)
Article: 1220066 | Received 13 May 2016, Accepted 22 Jul 2016, Published online: 31 Aug 2016
 

Abstract

Background: Microarray technology allows simultaneously detecting thousands of genes within one single experiment. The Student’s t-test (for a two-sample situation) can be used to compare the mean expression of a gene, taken from replicate arrays, to detect differential expression under the conditions being studied, such as a disease. However, a general statistical test may have insufficient power to correctly detect differentially expressed genes of heterogeneous and positively skewed data. Methods: Here we define a differentially expressed gene as with significantly different expression in means, variances, or both between the two groups of microarray. Monte Carlo simulation shows that the “half Johnson’s modified t-test” maintains quite accurate type I error rates in normal and non-normal distributions. And the half Johnson’s modified t-test was more powerful than the half Student’s t-test overall when the ratio of standard deviations between case and control groups is greater than 1. Results: Analysis of a colon cancer data shows that when the false discovery rate (FDR) is controlled at 0.05, the half Johnson’s modified t-test can detect 429 differentially expressed genes, which is larger than the number of differentially expressed genes (i.e. 344) detected by the half Student’s t. To target 100 priority genes, the half Johnson’s modified t only set FDR to 4.28 × 10−8, but for the half Student’s t, it is set to 5.39 × 10−4. Conclusions: The half Johnson’s modified t-test is recommended for the detection of differentially expressed genes in heterogeneous and ONLY positively skewed data.

Public Interest Statement

Gene expression has been a popular research topic in recent years. Student’s t-test is commonly adopted to screen disease-related genes. However, when the researches are focused on heterogeneous and positively skewed expression data, the means of gene expression levels between case and control groups may be similar, and thus, the difference would be insignificant using conventional Student’s t-test. This study proposed half Johnson’s modified t-test to correctly detect differentially expressed genes of heterogeneous and positively skewed data. Test statistics of half Johnson’s modified t-test only considers sample standard deviation of control group, while that of case group is not included. After controlling false discovery rate (cut-off point set to 0.05) of colon cancer gene expression data, half Johnson’s modified t-test could detect 364 more significant genes than conventional Student’s t-test. Half Student t-test is worth recommending as a method for detecting differentially expressed genes in heterogeneous and positively skewed data.

Competing Interest

The authors declare no competing interest.

Additional information

Notes on contributors

I-Shiang Tzeng

I-Shiang Tzeng is a doctoral researcher in the National Translational Medicine and Clinical trial Resource Center (NTCRC) composed by Academia Sinica, National Taiwan University and National Yang-Ming University, Taiwan. He currently serves as a bioinformatics and biostatistics consultant in NTCRC. He also is an adjunct assistant professor in the department of statistics, National Taipei University, Taiwan. His area of research includes biostatistics and epidemiologic method and further studies proposing the potential powerful method to detect differential expressed genes. His research interests include the field of age-period-cohort (APC) modeling from social issues to biological issues as well as the analysis of the APC models that arise in all these applications.