Abstract
An increasing number of studies that are widely used in the demographic research community have collected genome-wide data from their respondents. It is therefore important that demographers have a proper understanding of some of the methodological tools needed to analyze such data. This article details the underlying methodology behind one of the most common techniques for analyzing genome-wide data, genome-wide complex trait analysis (GCTA). GCTA models provide heritability estimates for health, health behaviors, or indicators of attainment using data from unrelated persons. Our goal was to describe this model, highlight the utility of the model for biodemographic research, and demonstrate the performance of this approach under modifications to the underlying assumptions. The first set of modifications involved changing the nature of the genetic data used to compute genetic similarities between individuals (the genetic relationship matrix). We then explored the sensitivity of the model to heteroscedastic errors. In general, GCTA estimates are found to be robust to the modifications proposed here, but we also highlight potential limitations of GCTA estimates.
Funding
This research was supported in part by the following grants from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD): R01HD060726 and R21HD078031. We also received support from the NICHD-supported University of Colorado Population Center (CUPC R24 HD066613), and Wedow was supported by the National Science Foundation’s Graduate Research Fellowship Program (DGE 1144083). The HRS (Health and Retirement Study) is sponsored by the National Institute on Aging (NIA U01AG009740) and is conducted by the University of Michigan.
Notes
1 GCTA analyses nearly always focus on SNPs rather than other genetic variants. In this article, genetic markers and variants will be used interchangeably for SNPs.
2 Diagonal elements of A (when j = k) are inbreeding coefficients. We do not discuss them further here since they are of marginal interest in the estimation of heritability (see Yang et al. Citation2011 for information on their calculation).
3 Hardy-Weinberg equilibrium (HWE) occurs when observed genotypes match expected genotypes given a particular minor allele frequency. If the minor allele a has frequency p, then the genotype frequencies should be p2 (for homozygous minor allele aa), 2pq for the heterozygotes (e.g., ab and ba), and q2 for the homozygous major allele. Deviations from HWE are used to detect genotyping errors, deviations from random mating, and genetic drift.
5 Specifically, the RAND fat files, available at http://www.rand.org/labor/aging/dataprod/enhanced-fat.html.
6 All variables except educational attainment were taken from Wave 8.
7 As noted previously, it is a standard practice in GCTA to remove individuals from pairs with estimated genetic similarities greater than 0.025 (in the metric established by Eq. 1) to ensure that no closely related (e.g., parent–offspring, siblings, etc.) individuals are included. Such individuals may share a common environment that may bias the resulting heritability estimate. However, we do not include such a threshold here because of the fact that the changing number of markers has major implications for the number of pairs that fall below this threshold. We did remove 347 individuals from these analyses, such that the original set of genetic similarity estimates are all below the 0.025 threshold.