889
Views
15
CrossRef citations to date
0
Altmetric
Statistical Computing and Graphics

A Set of Efficient Methods to Generate High-Dimensional Binary Data With Specified Correlation Structures

ORCID Icon, ORCID Icon, &
Pages 310-322 | Received 10 Dec 2019, Accepted 22 Aug 2020, Published online: 16 Oct 2020
 

Abstract

High-dimensional correlated binary data arise in many areas, such as observed genetic variations in biomedical research. Data simulation can help researchers evaluate efficiency and explore properties of different computational and statistical methods. Also, some statistical methods, such as Monte Carlo methods, rely on data simulation. Lunn and Davies proposed linear time complexity methods to generate correlated binary variables with three common correlation structures. However, it is infeasible to specify unequal probabilities in their methods. In this article, we introduce several computationally efficient algorithms that generate high-dimensional binary data with specified correlation structures and unequal probabilities. Our algorithms have linear time complexity with respect to the dimension for three commonly studied correlation structures, namely exchangeable, decaying-product and K-dependent correlation structures. In addition, we extend our algorithms to generate binary data of specified nonnegative correlation matrices satisfying the validity condition with quadratic time complexity. We provide an R package, CorBin, to implement our simulation methods. Compared to the existing packages for binary data generation, the time cost to generate a 100-dimensional binary vector with the common correlation structures and general correlation matrices can be reduced up to 105 folds and 103 folds, respectively, and the efficiency can be further improved with the increase of dimensions. The R package CorBin is available on CRAN at https://cran.r-project.org/.

Supplementary Materials

CorBin: R-package CorBin containing code to implement the algorithms described in the article. (GNU zipped tar file)

CorBin-manual: User manual for R package CorBin. (.pdf file)

Examples: The demonstration data contain five CSV files (Example1-5.csv), corresponding to five examples in described in Section 3.1 (), which illustrate the constructions of the algorithms and the choices for the parameters. (.csv file)

Example-code: The reproducing code for demonstration data. (.R file)

Acknowledgments

We thank the anonymous reviewer and the editor for their highly constructive and detailed feedback that helped us improve our article substantially.

Additional information

Funding

This research was supported in part by the NSF grant DMS 1713120.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 106.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.