GWAS Summary Statistics
Genome-wide association studies (GWAS) provide a powerful tool for identifying genetic loci associated with phenotypes of interest. The sharing of GWAS summary statistics has enabled a range of secondary research applications that do not require access to the individual level data such as gene prioritization, fine-mapping, pathway enrichment analyses, causal inference of exposures, risk prediction, genetic correlation and heritability estimation.
Several thousand GWAS summary statistics are available in the Data Ark, obtained from the IEU Open GWAS Project including:
* ebi-a (n = 288): GWAS satisfying minimum requirements imported from the EBI database of complete GWAS summary data
* ieu-a (n = 440): GWAS generated by many different consortia that have been manually collected and curated, initially developed for MR-Base
* ieu-b (n = 37): GWAS generated by many different consortia that have been manually collected and curated, initially developed for MR-Base (round 2)
* ukb-b (n = 2514): IEU analysis of UK Biobank phenotypes
These GWAS are stored on the Data Ark in the GWAS-VCF format, which provides a consistent and robust approach to storing genetic variants, annotations and metadata enabling interoperability and reusability consistent with the FAIR principles . Crucially, this ensures that all the provided GWAS are harmonized so that eg. the ALT allele corresponds to the effect allele and that all the files utilize a consistent labeling scheme.
More details on the GWAS-VCF format (illustrated above) and the available Open-source tools for working with GWAS-VCFs can be found in the corresponding publication. Lyon, M. et al. (2021). The variant call format provides efficient and robust storage of GWAS summary statistics. Genome Biol 22, 32, and are included in the Data Ark.
The GWAS summary statistics, as well as scripts for working with them, were uploaded to the Data Ark by Shea Andrews (firstname.lastname@example.org), on 01/16/21.
To use this data, NO DUA form is required, you can access the data at the following path on Minerva – /sc/arion/projects/data-ark/Public_Unrestricted/GWAS_SumStats or you can load module $ module load dataark to see the path variables.
Data Ark Data Sets
Public data sets (unrestricted)
- 1,000 Genomes Project
- GWAS Summary Stats
- The Cancer Genome Atlas (TCGA)
- Reference Genome
Public data sets (restricted)
Mount Sinai generated data (unrestricted)
Mount Sinai generated data (restricted)