Abstract
We study the problem of nonparametric dependence detection. Many existing methods may suffer severe power loss due to nonuniform consistency, which we illustrate with a paradox. To avoid such power loss, we approach the nonparametric test of independence through the new framework of binary expansion statistics (BEStat) and binary expansion testing (BET), which examine dependence through a novel binary expansion filtration approximation of the copula. Through a Hadamard transform, we find that the symmetry statistics in the filtration are complete sufficient statistics for dependence. These statistics are also uncorrelated under the null. By using symmetry statistics, the BET avoids the problem of nonuniform consistency and improves upon a wide class of commonly used methods (a) by achieving the minimax rate in sample size requirement for reliable power and (b) by providing clear interpretations of global relationships upon rejection of independence. The binary expansion approach also connects the symmetry statistics with the current computing system to facilitate efficient bitwise implementation. We illustrate the BET with a study of the distribution of stars in the night sky and with an exploratory data analysis of the TCGA breast cancer data. Supplementary materials for this article are available online.
Supplementary Materials
Online supplementary materials for this article include additional numerical studies, proofs of the results, and R functions used in the numerical studies.
Acknowledgments
The author thanks Richard Berk, Larry Brown, Andreas Buja, Edward George, Arun Kumar Kuchibhotla, Linda Zhao, and Zhigen Zhao for inspiring discussions that stimulated this research. The author also thanks Edoardo Airoldi, Mike Baiocchi, Shankar Bhamidi, Bhaswar Bhattacharya, Rong Chen, Jessi Cisewski, Bradley Efron, Jianqing Fan, Dean Foster, Andrew Gelman, Jan Hannig, Ruth Heller, Peter Hoff, Katherine Hoadley, Xiaoming Huo, Pierre Jacob, Vinay Kashyap, Michael Kosorok, S.C. Samuel Kou, Duyeol Lee, Michael Levine, Ping Li, Yun Li, Xihong Lin, Oliver Linton, Han Liu, Jun Liu, Mike Love, Li Ma, Zongming Ma, Steve Marron, Xiao-Li Meng, Joel Parker, Charles Perou, Vladas Pipiras, Richard Samworth, David Siegmund, Dylan Small, Robert Stine, William Strawderman, Weijie Su, Gábor Székely, Xinlu Tan, Yihong Wu, Han Xiao, Daniel Yekutieli, Ming Yuan, Yuan Yuan, Cun-Hui Zhang, Nancy Zhang, Tingting Zhang, and Harrison Zhou for valuable comments and suggestions. This research is completed while the author is visiting Princeton University. The author thanks Jianqing Fan and the Department of Operations Research and Financial Engineering at Princeton for the warm hospitality.