Scientific Computing and Data

Partnering with researchers to advance scientific discovery

Data Ark Data Sets, Version 1

As of launch, the Data Ark consists of the seven data sets listed below (click links for dedicated data set pages). We plan to expand the number, type and diversity of data sets over the next year.

Public data sets (unrestricted):

  • 1,000 Genomes Project – Whole Genome Sequencing (WGS) data on ~1,000 individuals of mixed ancestry
  • GTEx – Gene expression data on hundreds of individuals across ~50 tissues
  • GWAS Summary Stats – Genome Wide Association Studies (GWAS) results in standardized format across 1,000s of outcomes

Public data sets (restricted):

  • UK Biobank – Genetic data (genotype/WES) from the UK Biobank data on 500,000 individuals.
  • TCGA data – COMING SOON!!

Mount Sinai generated data (unrestricted):

  • STOP COVID NYC Cohort – symptom and behavior on COVID-19 on ~50,000 New York City residents surveyed via phone apps in April 2020

Mount Sinai generated data (restricted):


Access Data Ark

All users must read, agree to, and sign the Data Use Agreement specific to the requested data set. Once the agreement has been submitted, as well as any evidence of approved permission for public restricted-use data, the Data Ark team will grant access within two working days. Users will receive email confirmation that access has been granted.

The Data Use Agreement is accessible only through the Mount Sinai campus network or secure remote VPN. Click here for the Data Use Agreement and choose the data set that you would like to access from the drop-down list. From here you can follow the link to view and agree to the specific Data Use Agreement. Users will need to login with your Sinai account and password and will be able to choose only one data set at a time.

For all inquiries relating to the Data Ark please email: