GWAS Summary Statistics
Genome-wide association studies (GWAS) provide a powerful tool for identifying genetic loci associated with phenotypes of interest. The sharing of GWAS summary statistics has enabled a range of secondary research applications that do not require access to the individual level data such as gene prioritization, fine-mapping, pathway enrichment analyses, causal inference of exposures, risk prediction, genetic correlation and heritability estimation.
Several thousand GWAS summary statistics are available in the Data Ark, obtained from the IEU Open GWAS Project including:
* ebi-a (n = 288): GWAS satisfying minimum requirements imported from the EBI database of complete GWAS summary data
* ieu-a (n = 440): GWAS generated by many different consortia that have been manually collected and curated, initially developed for MR-Base
* ieu-b (n = 37): GWAS generated by many different consortia that have been manually collected and curated, initially developed for MR-Base (round 2)
* ukb-b (n = 2514): IEU analysis of UK Biobank phenotypes
These GWAS are stored on the Data Ark in the GWAS-VCF format, which provides a consistent and robust approach to storing genetic variants, annotations and metadata enabling interoperability and reusability consistent with the FAIR principles . Crucially, this ensures that all the provided GWAS are harmonized so that eg. the ALT allele corresponds to the effect allele and that all the files utilize a consistent labeling scheme.
More details on the GWAS-VCF format (illustrated above) and the available Open-source tools for working with GWAS-VCFs can be found in the corresponding publication. Lyon, M. et al. (2021). The variant call format provides efficient and robust storage of GWAS summary statistics. Genome Biol 22, 32, and are included in the Data Ark.
The GWAS summary statistics, as well as scripts for working with them, were uploaded to the Data Ark by Shea Andrews (email@example.com), on 01/16/21.
To use this data, you must read, agree and sign the Data Use Agreement (you must be logged in through the Mount Sinai campus network or secure remote VPN)
Data Ark Data Sets
Public data sets (unrestricted)
Public data sets (restricted)
Mount Sinai generated data (unrestricted)
Mount Sinai generated data (restricted)