Abstract
Hundreds of autism risk genes have been reported recently, mainly based on genetic studies where these risk genes have more de novo mutations in autism subjects than healthy controls. However, as a complex disease, autism is likely associated with more risk genes and many of them may not be identifiable through de novo mutations. We hypothesize that more autism risk genes can be identified through their connections with known autism risk genes in personalized gene–gene interaction graphs. We estimate such personalized graphs using single-cell RNA sequencing (scRNA-seq) while appropriately modeling the cell dependence and possible zero-inflation in the scRNA-seq data. The sample size, which is the number of cells per individual, ranges from 891 to 1241 in our case study using scRNA-seq data in autism subjects and controls. We consider 1500 genes in our analysis. Since the number of genes is larger or comparable to the sample size, we perform penalized estimation. We score each gene’s relevance by applying a simple graph kernel smoothing method to each personalized graph. The molecular functions of the top-scored genes are related to autism diseases. For example, a candidate gene RYR2 that encodes protein ryanodine receptor 2 is involved in neurotransmission, a process that is impaired in ASD patients. While our method provides a systemic and unbiased approach to prioritize autism risk genes, the relevance of these genes needs to be further validated in functional studies. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
Supplementary Materials
In the Supplementary Materials, we provide more numerical results and analysis. We further justify the assumptions of our proposed methods in Section S.1. We compare different graph estimation methods at the same sparsity level in Section S.2. In Section S.3, the gene–gene interaction graph estimated in Section 6 is analyzed from various perspectives. In Section S.4, we evaluate the effectiveness of data imputation on graph estimation. In Section S.5, our approach is compared with its oracle cell-dependence counterpart as well as several other methods.
Acknowledgments
The authors thank the editor, the associate editor, and reviewers, whose helpful comments and suggestions led to a much improved presentation.