29
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Bootstrapped Edge Count Tests for Nonparametric Two-Sample Inference Under Heterogeneity

, &
Received 09 Apr 2023, Accepted 24 Jun 2024, Accepted author version posted online: 01 Jul 2024
Accepted author version

References

  • Sumit Agarwal, Pulak Ghosh, Jing Li, and Tianyue Ruan. Digital payments induce over-spending: Evidence from the 2016 demonetization in india. 2019. URL https://abfer.org/media/abfer-events-2019/annual-conference/economic-transformation-of-asia/AC19P4028_Digital_Payments_Induce_Excessive_Spending_Evidence_from_Demonetization_in_India.pdf.
  • Ikpe Justice Akpan, Elijah Abasifreke Paul Udoh, and Bamidele Adebisi. Small business awareness and adoption of state-of-the-art technologies in emerging and developing markets, and lessons from the covid-19 pandemic. Journal of Small Business & Entrepreneurship, 34(2): 123–140, 2022.
  • B Aslan and G Zech. New test for the multivariate two-sample problem based on the concept of minimum energy. Journal of Statistical Computation and Simulation, 75(2):109–119, 2005.
  • Janet Balis. 10 truths about marketing after the pandemic. 2021. URL https://hbr.org/2021/03/10-truths-about-marketing-after-the-pandemic.
  • Trambak Banerjee, Bhaswar B Bhattacharya, and Gourab Mukherjee. A nearest-neighbor based nonparametric test for viral remodeling in heterogeneous single-cell proteomic data. Annals of Applied Statistics, 14(4):1777–1805, 2020.
  • Trambak Banerjee, Peng Liu, Gourab Mukherjee, Shantanu Dutta, and Hai Che. Joint modeling of playing time and purchase propensity in massively multiplayer online role-playing games using crossed random effects. The Annals of Applied Statistics, 17(3):2533 – 2554, 2023. DOI: 10.1214/23-AOAS1731.URL .
  • Ludwig Baringhaus and Carsten Franz. On a new multivariate two-sample test. Journal of multivariate analysis, 88(1):190–206, 2004.
  • Alexander W Bartik, Marianne Bertrand, Zoe Cullen, Edward L Glaeser, Michael Luca, and Christopher Stanton. The impact of covid-19 on small business outcomes and expectations. Proceedings of the national academy of sciences, 117(30):17656–17666, 2020.
  • Claude JP Bélisle, H Edwin Romeijn, and Robert L Smith. Hit-and-run algorithms for generating multivariate distributions. Mathematics of Operations Research, 18(2):255–266, 1993.
  • Bhaswar B Bhattacharya. A general asymptotic framework for distribution-free graph-based two-sample tests. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(3):575–602, 2019.
  • Peter J Bickel. A distribution free version of the smirnov two sample test in the p-variate case. The Annals of Mathematical Statistics, 40(1):1–23, 1969.
  • Kayla Bruun. Supply chain disruptions limit consumer spending. 2021. URL https://morningconsult.com/2021/09/27/supply-chain-disruptions-limit-consumer-spending/.
  • Ben J Callahan, Kris Sankaran, Julia A Fukuyama, Paul J McMurdie, and Susan P Holmes. Bioconductor workflow for microbiome data analysis: from raw reads to community analyses. F1000Research, 5, 2016.
  • Hao Chen and Jerome H Friedman. A new graph-based two-sample test for multivariate and object data. Journal of the American statistical association, 112(517):397–409, 2017.
  • Hao Chen and Nancy Zhang. Graph-based change-point detection. The Annals of Statistics, 43 (1):139–176, 2015.
  • Hao Chen, Xu Chen, and Yi Su. A weighted edge-count two-sample test for multivariate and object data. Journal of the American Statistical Association, 113(523):1146–1155, 2018.
  • Lisha Chen, Winston Wei Dou, and Zhihua Qiao. Ensemble subsampling for imbalanced multivariate two-sample tests. Journal of the American Statistical Association, 108(504): 1308–1323, 2013.
  • James H Chung and Donald AS Fraser. Randomization tests for a multivariate two-sample problem. Journal of the American Statistical Association, 53(283):729–735, 1958.
  • Kacper P Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, and Arthur Gretton. Fast two-sample testing with analytic representations of probability measures. Advances in Neural Information Processing Systems, 28, 2015.
  • Knut Conradsen, Allan Aasbjerg Nielsen, Jesper Schou, and Henning Skriver. A test statistic in the complex wishart distribution and its application to change detection in polarimetric sar data. IEEE Transactions on Geoscience and Remote Sensing, 41(1):4–19, 2003.
  • Nicolas Crouzet, Apoorv Gupta, and Filippo Mezzanotti. Shocks and technology adoption: Evidence from electronic payment systems. Techn. rep., Northwestern University Working Paper, 2019.
  • Nabarun Deb and Bodhisattva Sen. Multivariate rank-based distribution-free nonparametric testing using measure transportation. Journal of the American Statistical Association, pages 1–16, 2021.
  • Daniel Dvorkin. lcmix: Layered and chained mixture models, 2012. URL https://R-Forge.R-project.org/projects/lcmix/. R package version 0.3/r5.
  • Michael T Fahey, Christopher W Thane, Gemma D Bramwell, and W Andy Coward. Conditional gaussian mixture modelling for dietary pattern analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society), 170(1):149–166, 2007.
  • Karen B Farris and Donald P Schopflocher. Between intention and behavior: an application of community pharmacists’ assessment of pharmaceutical care. Social science & medicine, 49(1): 55–66, 1999.
  • Valerie S Folkes, Susan Koletsky, and John L Graham. A field study of causal inferences and consumer reaction: the view from the airport. Journal of consumer research, 13(4):534–539, 1987.
  • Jerome H Friedman and Lawrence C Rafsky. Multivariate generalizations of the wald-wolfowitz and smirnov two-sample tests. The Annals of Statistics, pages 697–717, 1979.
  • Julia Fukuyama. phyloseqgraphtest: Graph-based permutation tests for microbiome data. 2020. URL hhttps://cran.rstudio.com/web/packages/phyloseqGraphTest/index.html.
  • Promit Ghosal and Bodhisattva Sen. Multivariate ranks and quantiles using optimal transportation and applications to goodness-of-fit testing. arXiv preprint arXiv:1905.05340, 2019.
  • Arthur Gretton, Karsten M Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex J Smola. A kernel method for the two-sample-problem. In Advances in neural information processing systems, pages 513–520, 2007.
  • Peter Hall and Nader Tajvidi. Permutation tests for equality of distributions in high-dimensional settings. Biometrika, 89(2):359–374, 2002.
  • Ruth Heller, Shane T Jensen, Paul R Rosenbaum, and Dylan S Small. Sensitivity analysis for the cross-match test, with applications in genomics. Journal of the American Statistical Association, 105(491):1005–1013, 2010a.
  • Ruth Heller, Paul R Rosenbaum, and Dylan S Small. Using the cross-match test to appraise covariate balance in matched pairs. The American Statistician, 64(4):299–309, 2010b.
  • Norbert Henze. On the number of random points with nearest neighbour of the same type and a multivariate two-sample test. Metrika, 31:259–273, 1984.
  • Norbert Henze. A multivariate two-sample test based on the number of nearest neighbor type coincidences. The Annals of Statistics, 16(2):772–783, 1988.
  • Norbert Henze and Mathew Penrose. On the multivariate runs test. The Annals of Statistics, 27 (1):290–298, 1999.
  • Susan Holmes and Wolfgang Huber. Modern statistics for modern biology. Cambridge University Press, 2018.
  • Jay G Hull, Timothy J Brunelle, Anna T Prescott, and James D Sargent. A longitudinal study of risk-glorifying video games and behavioral deviance. Journal of personality and social psychology, 107(2):300, 2014.
  • Bikram Karmakar, Kumaresh Dhara, Kushal Kumar Dey, Analabha Basu, and Anil Kumar Ghosh. Tests for statistical significance of a treatment effect in the presence of hidden sub-populations. Statistical Methods & Applications, 24:97–119, 2015.
  • Aino Koskenniemi. Deviant consumption meets consumption-as-usual: The construction of deviance and normality within consumer research. Journal of Consumer Culture, 21(4):827–847, 2021.
  • Wouter Labeeuw and Geert Deconinck. Residential electrical load model based on mixture model clustering and markov models. IEEE Transactions on Industrial Informatics, 9(3):1561–1569, 2013.
  • Changho Lee and Ocktae Kim. Predictors of online game addiction among korean adolescents. Addiction Research & Theory, 25(1):58–66, 2017.
  • Jeroen S Lemmens, Patti M Valkenburg, and Jochen Peter. Development and validation of a game addiction scale for adolescents. Media psychology, 12(1):77–95, 2009.
  • Eric W Liguori and Thomas G Pittz. Strategies for small business: Surviving and thriving in the era of covid-19. Journal of the International Council for Small Business, 1(2):106–110, 2020.
  • G Meeden and R Lazar. polyapost: Simulating from the polya posterior. R Package Version, 1.7, 2021. URL https://cran.r-project.org/web/packages/polyapost/index.html.
  • Somabha Mukherjee, Divyansh Agarwal, Nancy R Zhang, and Bhaswar B Bhattacharya. Distribution-free multisample tests based on optimal matchings with applications to single cell genomics. Journal of the American Statistical Association, 117(538):627–638, 2022.
  • Nancy M Petry, Florian Rehbein, Douglas A Gentile, Jeroen S Lemmens, Hans-Jürgen Rumpf, Thomas Mößle, Gallus Bischof, Ran Tao, Daniel SS Fung, Guilherme Borges, et al. An international consensus for assessing internet gaming disorder using the new dsm-5 approach. Addiction, 109(9):1399–1406, 2014.
  • Yasir Rahmatallah, Frank Emmert-Streib, and Galina Glazko. Gene set analysis for self-contained tests: complex null and specific alternative hypotheses. Bioinformatics, 28(23):3073–3080, 2012.
  • Aaditya Ramdas, Sashank Jakkam Reddi, Barnabás Póczos, Aarti Singh, and Larry Wasserman. On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
  • Aaditya Ramdas, Nicolás García Trillos, and Marco Cuturi. On wasserstein two-sample testing and related families of nonparametric tests. Entropy, 19(2):47, 2017.
  • Paul R Rosenbaum. An exact distribution-free test comparing two multivariate distributions based on adjacency. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67 (4):515–530, 2005.
  • Peter E Rossi, Greg M Allenby, and Rob McCulloch. Bayesian statistics and marketing. John Wiley & Sons, 2012.
  • Russell L Rothman, Ryan Housam, Hilary Weiss, Dianne Davis, Rebecca Gregory, Tebeb Gebretsadik, Ayumi Shintani, and Tom A Elasy. Patient understanding of food labels: the role of literacy and numeracy. American journal of preventive medicine, 31(5):391–398, 2006.
  • Mark F Schilling. Multivariate two-sample tests based on nearest neighbors. Journal of the American Statistical Association, 81(395):799–806, 1986.
  • Hongjian Shi, Mathias Drton, and Fang Han. Distribution-free consistent independence tests via center-outward ranks and signs. Journal of the American Statistical Association, pages 1–16, 2020a.
  • Hongjian Shi, Marc Hallin, Mathias Drton, and Fang Han. On universally consistent and fully distribution-free rank tests of vector independence. arXiv preprint arXiv:2007.02186, 2020b.
  • Xiaoping Shi, Yuehua Wu, and Calyampudi Radhakrishna Rao. Consistent and powerful graph-based change-point test for high-dimensional data. Proceedings of the National Academy of Sciences, 114(15):3873–3878, 2017.
  • Robert L Smith. Efficient monte carlo procedures for generating points uniformly distributed over bounded regions. Operations Research, 32(6):1296–1308, 1984.
  • Robert L Smith. The hit-and-run sampler: a globally reaching markov chain sampler for generating arbitrary multivariate distributions. In Proceedings of the 28th conference on Winter simulation, pages 260–264, 1996.
  • Gábor J Székely. E-statistics: The energy of statistical samples. Bowling Green State University, Department of Mathematics and Statistics Technical Report, 3(05):1–18, 2003.
  • Gábor J Székely and Maria L. Rizzo. Testing for equal distributions in high dimension. InterStat, 5(16.10):1249–1272, 2004.
  • Gábor J Székely and Maria L Rizzo. Energy statistics: A class of statistics based on distances. Journal of statistical planning and inference, 143(8):1249–1272, 2013.
  • Robert Tibshirani and Guenther Walther. Cluster validation by prediction strength. Journal of Computational and Graphical Statistics, 14(3):511–528, 2005.
  • Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  • Peter Xue-Kun Song. Multivariate dispersion models generated from gaussian copula. Scandinavian Journal of Statistics, 27(2):305–320, 2000.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.