{"id":1325,"date":"2020-07-07T09:58:41","date_gmt":"2020-07-07T13:58:41","guid":{"rendered":"https:\/\/labs.icahn.mssm.edu\/minervalab\/?page_id=1325"},"modified":"2024-01-10T17:41:29","modified_gmt":"2024-01-10T22:41:29","slug":"1000-genomes","status":"publish","type":"page","link":"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/1000-genomes\/","title":{"rendered":"1000 Genomes Project"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; fullwidth=&#8221;on&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_fullwidth_menu menu_id=&#8221;14&#8243; menu_style=&#8221;centered&#8221; fullwidth_menu=&#8221;on&#8221; active_link_color=&#8221;#d80b8c&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; menu_font=&#8221;|600|||||||&#8221; menu_text_color=&#8221;#FFFFFF&#8221; menu_font_size=&#8221;16px&#8221; background_color=&#8221;#221f72&#8243; background_layout=&#8221;dark&#8221; sticky_position=&#8221;top&#8221;][\/et_pb_fullwidth_menu][\/et_pb_section][et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;0px||0px||false|false&#8221;][et_pb_row _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;||0px||false|false&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_text admin_label=&#8221;Breadcrumb&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;]<\/p>\n<p><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/scientific-computing-and-data\/\">Scientific Computing and Data<\/a> \/ <a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/rds\/\">Research Data Services<\/a> \/\u00a0<a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/\">Data Ark: Data Commons<\/a> \/ 1,000 Genomes Project<\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section][et_pb_section fb_built=&#8221;1&#8243; admin_label=&#8221;section&#8221; _builder_version=&#8221;4.9.0&#8243; width=&#8221;100%&#8221; custom_padding=&#8221;0px||||false|false&#8221;][et_pb_row _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; custom_margin=&#8221;|auto|-12px|auto||&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_text admin_label=&#8221;Header&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;]<\/p>\n<h1><strong><span style=\"color: #000080\">1,000 Genomes Project<\/span><\/strong><\/h1>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row column_structure=&#8221;1_5,3_5,1_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_column type=&#8221;1_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][\/et_pb_column][et_pb_column type=&#8221;3_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_image src=&#8221;https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-content\/uploads\/sites\/342\/2021\/01\/da2.png&#8221; title_text=&#8221;da2&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; custom_margin=&#8221;-37px|||||&#8221;][\/et_pb_image][\/et_pb_column][et_pb_column type=&#8221;1_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; custom_margin=&#8221;|auto|35px|auto||&#8221; custom_padding=&#8221;28px|||||&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_text admin_label=&#8221;Body text&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; text_line_height=&#8221;1.5em&#8221;]<\/p>\n<h2>Overview<\/h2>\n<p>The 1000 Genomes Project was initiated in 2008, comprising 3 pilot phases that focused on low coverage whole genome sequencing (WGS) of 180 individuals of African\/Asian\/European ancestry, and deep coverage of two trios and of 1000 genes in 900 unrelated samples. These pilot studies were expanded to larger projects, published across three <em>Nature<\/em> papers published in <a href=\"http:\/\/www.nature.com\/nature\/journal\/v467\/n7319\/full\/nature09534.html\">2010<\/a>, <a href=\"https:\/\/www.nature.com\/articles\/nature11632\">2012<\/a>, <a href=\"https:\/\/www.nature.com\/articles\/nature15394\">2015<\/a>, with the final data set corresponding to 2,504 individuals from 26 global populations (~4X WGS and 30X WES for all, and 24 individuals with 30X WGS for validation).[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row column_structure=&#8221;1_5,3_5,1_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_column type=&#8221;1_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][\/et_pb_column][et_pb_column type=&#8221;3_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_image src=&#8221;https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-content\/uploads\/sites\/342\/2021\/01\/1000.png&#8221; title_text=&#8221;1000&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; custom_margin=&#8221;-48px|||||&#8221;][\/et_pb_image][\/et_pb_column][et_pb_column type=&#8221;1_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_text admin_label=&#8221;Access text&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; text_line_height=&#8221;1.5em&#8221; hover_enabled=&#8221;0&#8243; sticky_enabled=&#8221;0&#8243;]<\/p>\n<p>Since its completion in 2013, the 1000 Genomes Project was superseded by The International Genome Sample Resource (IGSR), established to host and extend the 1000G data. For more information on the 1000G Project and the IGSR see their website here: <a href=\"https:\/\/www.internationalgenome.org\/home\">1000G &amp; IGSR website<\/a>.<\/p>\n<p>The Data Ark hosts a replica of the Phase 3 individual-level called genotype data (VCF format) created by Google Health using high coverage (30X) Illumina sequencing performed by the\u00a0<a href=\"https:\/\/www.internationalgenome.org\/\">New York Genome Center<\/a>.\u00a0The CRAM files can be obtained from the 1000G website but these are extremely large (36TB) and will not be needed for the vast majority of user cases and so are not included on the Data Ark.<\/p>\n<h2>Access<\/h2>\n<p>Effective from January 22, 2024, you must read, agree and sign the <a href=\"https:\/\/dataarkforms.hpc.mssm.edu\/\">Data Use Agreement <\/a>(you must be logged in through the Mount Sinai campus network or secure remote VPN). Access is granted within 24 hours, and on Minerva, you can load module <strong>$ module load dataark <\/strong>to see the path variables.<\/p>\n<h2>Data Ark Data Sets<\/h2>\n<p>Please visit the <a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/data-ark-data-sets\/\">Data Ark Data Set<\/a> webpage to explore other data sets.<\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Scientific Computing and Data \/ Research Data Services \/\u00a0Data Ark: Data Commons \/ 1,000 Genomes Project1,000 Genomes ProjectOverview The 1000 Genomes Project was initiated in 2008, comprising 3 pilot phases that focused on low coverage whole genome sequencing (WGS) of 180 individuals of African\/Asian\/European ancestry, and deep coverage of two trios and of 1000 genes [&hellip;]<\/p>\n","protected":false},"author":415,"featured_media":0,"parent":1321,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"\u00a0\r\n\r\n<img class=\"aligncenter wp-image-1513 size-full\" src=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-content\/uploads\/sites\/342\/2021\/01\/da2.png\" alt=\"\" width=\"941\" height=\"491\" \/>\r\n<h4><strong>1000 Genomes Project<\/strong><\/h4>\r\nThe 1000 Genomes Project was initiated in 2008, comprising 3 pilot phases that focused on low coverage whole genome sequencing (WGS) of 180 individuals of African\/Asian\/European ancestry, and deep coverage of two trios and of 1000 genes in 900 unrelated samples. These pilot studies were expanded to larger projects, published across three <em>Nature<\/em> papers published in <a href=\"http:\/\/www.nature.com\/nature\/journal\/v467\/n7319\/full\/nature09534.html\">2010<\/a>, <a href=\"https:\/\/www.nature.com\/articles\/nature11632\">2012<\/a>, <a href=\"https:\/\/www.nature.com\/articles\/nature15394\">2015<\/a>, with the final data set corresponding to 2,504 individuals from 26 global populations (~4X WGS and 30X WES for all, and 24 individuals with 30X WGS for validation).<img class=\"aligncenter size-full wp-image-1534\" src=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-content\/uploads\/sites\/342\/2021\/01\/1000.png\" alt=\"\" width=\"470\" height=\"134\" \/>\r\n\r\nSince its completion in 2013, the 1000 Genomes Project was superseded by The International Genome Sample Resource (IGSR), established to host and extend the 1000G data. For more information on the 1000G Project and the IGSR see the website here: <a href=\"https:\/\/www.internationalgenome.org\/home\">1000G & IGSR website<\/a>.\r\n\r\nThe Data Ark hosts a replica of the Phase 3 individual-level called genotype data (VCF format) created by Google Health using high coverage (30X) Illumina sequencing performed by the\u00a0<a href=\"https:\/\/www.internationalgenome.org\/\">New York Genome Center<\/a>\u00a0and called using DeepVariant (detailed methods about the variant calling pipeline can be found\u00a0<a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/33399819\/\">here<\/a>). The CRAM files can be obtained from the 1000G website but these are extremely large (36TB) and will not be needed for the vast majority of user cases and so are not included on the Data Ark.\r\n\r\nTo use this data, you must read, agree and sign the <a href=\"https:\/\/dataarkforms.hpc.mssm.edu\/\"><strong><em>Data Use Agreement here<\/em><\/strong><\/a>.","_et_gb_content_width":"","footnotes":""},"class_list":["post-1325","page","type-page","status-publish","hentry"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages\/1325","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/users\/415"}],"replies":[{"embeddable":true,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/comments?post=1325"}],"version-history":[{"count":58,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages\/1325\/revisions"}],"predecessor-version":[{"id":7860,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages\/1325\/revisions\/7860"}],"up":[{"embeddable":true,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages\/1321"}],"wp:attachment":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/media?parent=1325"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}