Abstract
To learn about the progression of a complex disease, it is necessary to understand the physiology and function of many genes operating together in distinct interactions as a system. In order to significantly advance our understanding of the function of a system, we need to learn the causal relationships among its modeled genes. To this end, it is desirable to compare experiments of the system under complete interventions of some genes, e.g., knock-out of some genes, with experiments of the system without interventions. However, it is expensive and difficult (if not impossible) to conduct wet lab experiments of complete interventions of genes in animal models, e.g., a mouse model. Thus, it will be helpful if we can discover promising causal relationships among genes with observational data alone in order to identify promising genes to perturb in the system that can later be verified in wet laboratories. While causal Bayesian networks have been actively used in discovering gene pathways, most of the algorithms that discover pairwise causal relationships from observational data alone identify only a small number of significant pairwise causal relationships, even with a large dataset. In this article, we introduce new causal discovery algorithms—the Equivalence Local Implicit latent variable scoring Method (EquLIM) and EquLIM with Markov chain Monte Carlo search algorithm (EquLIM-MCMC)—that identify promising causal relationships even with a small observational dataset.