We have developed and implemented novel methods and approaches for causal analyses of complex disease. We have developed a modified approach to Mendelian Randomization to isolate causal influences among a set of correlated risk factors for plasma lipids and coronary artery disease (CAD) (Do et al. Nature Genetics. 2013). We applied this approach to investigate the causal influence of plasma triglycerides on CAD. This approach utilizes estimates of the effect sizes of genetic variants obtained from a meta-analysis of association results from the latest genome-wide association study for plasma lipids and CAD. We discovered, using a model accounting for effects on low-density lipoprotein cholesterol (LDL-C) and/or high-density lipoprotein cholesterol (HDL-C), that the strength of a polymorphism’s effect on triglyceride levels is correlated with the magnitude of its effect on CAD risk. These results provide evidence that triglyceride-rich lipoproteins may causally influence risk for CAD and suggest that scientists developing therapeutic targets for CAD should focus in part on plasma triglycerides.
We have utilized sequencing data generated from various medical genetics projects to pursue questions in population genetics. The Exome Sequencing Project showed that the majority of coding variation in the human genome is rare and deleterious. Furthermore, the study demonstrated that there is an overabundance of rare coding variation beyond what we would expect in theory and that this can be best explained by a population having undergone recent, accelerated growth. Building on this work, we developed a population genetics framework that uses the accumulation of deleterious coding mutations in both ancient and modern humans to make inferences about the role of natural selection in diverse sets of human populations and show that there is no evidence that natural selection has been less effective at removing deleterious mutations in Europeans than in West Africans (Do et al. Nature Genetics. 2015) Furthermore, we have shown that our framework can be used as a test to determine the mode of selection for genesets (Balick et al. PLOS Genetics. 2015).
We have used exome sequencing to discover new genes related to blood lipids and heart attack. We constructed one of the first robust statistical pipelines for analyzing exome sequencing data in families with Mendelian lipid disorders and discovered compound heterozygote nonsense mutations in the ANGPTL3 gene as a cause of familial combined hypolipidemia (Musunuru*, Pirruccello*, Do* et al.New England Journal of Medicine, 2010). We expanded upon this work by constructing an analytical pipeline to process exome sequencing data for 10,000 cases and controls for heart attack as part of the National Heart, Lung and Blood Institute Exome Sequencing Project (ESP). Using this analytical pipeline, we identified a burden of rare mutations in the APOA5 and LDLR gene as conferring risk for heart attack (Do*, Stitziel*, Won* et al. Nature. 2015). This finding provides proof of principle that exome sequencing can be used to discover rare mutations that contribute risk to a complex disease. As result of this research, we have also contributed insights to the literature on how best to design and conduct rare variant association studies (Do et al. Human Molecular Genetics. 2012).
We have systematically investigated the biological link between genetic variants associated with MI/CAD risk and their impacts on gene function (Won et al. PLOS Genetics, 2015). We examined the molecular consequences of CAD-associated common variants by integrating findings from genome-wide association studies for CAD with functional genomics data from the NHGRI ENCODE Project and NIH Roadmap Epigenomics Project. We partitioned the genetic risk of MI/CAD into different categories, to discern drivers in specific cell types that may biologically influence MI/CAD. We investigated components of polygenicity and heritability in distinct genomic compartments and across diverse cell types within three histone modification marks. We found that: (1) genetic variants residing in noncoding regions flanking protein-coding genes make up a large proportion of the heritability for MI/CAD; (2) association signals are enriched in histone modification marks; and (3) clear cell-type specific effects emerged with genetic effects of MI/CAD-associated SNPs being enhanced in adipocyte, brain and spleen cell lines. These results highlight the role of tissue-specific regulatory mechanisms in the genetic etiology of MI/CAD.