{"id":5134,"date":"2022-06-05T23:38:24","date_gmt":"2022-06-06T03:38:24","guid":{"rendered":"https:\/\/labs.icahn.mssm.edu\/minervalab\/?page_id=5134"},"modified":"2024-01-10T17:58:04","modified_gmt":"2024-01-10T22:58:04","slug":"reference-genome","status":"publish","type":"page","link":"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/reference-genome\/","title":{"rendered":"Reference Genome"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; fullwidth=&#8221;on&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_fullwidth_menu menu_id=&#8221;14&#8243; menu_style=&#8221;centered&#8221; fullwidth_menu=&#8221;on&#8221; active_link_color=&#8221;#d80b8c&#8221; dropdown_menu_bg_color=&#8221;#221f72&#8243; dropdown_menu_line_color=&#8221;#221f72&#8243; dropdown_menu_active_link_color=&#8221;#d80b8c&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; menu_font=&#8221;|600|||||||&#8221; menu_text_color=&#8221;#FFFFFF&#8221; menu_font_size=&#8221;16px&#8221; background_color=&#8221;#221f72&#8243; background_layout=&#8221;dark&#8221; sticky_position=&#8221;top&#8221;][\/et_pb_fullwidth_menu][\/et_pb_section][et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;0px||0px||false|false&#8221;][et_pb_row _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;||0px||false|false&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_text admin_label=&#8221;Breadcrumb&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;]<\/p>\n<p><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/scientific-computing-and-data\/\">Scientific Computing and Data<\/a> \/ <a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/rds\/\">Research Data Services<\/a> \/\u00a0<a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/\">Data Ark: Data Commons<\/a> \/ Reference Genome<\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section][et_pb_section fb_built=&#8221;1&#8243; admin_label=&#8221;section&#8221; _builder_version=&#8221;4.9.0&#8243; custom_padding=&#8221;0px||||false|false&#8221;][et_pb_row _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; custom_margin=&#8221;|auto|-12px|auto||&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_text admin_label=&#8221;Header&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; header_font=&#8221;|700|||||||&#8221; header_text_color=&#8221;#221f72&#8243;]<\/p>\n<h1><strong><span style=\"color: #000080\">Reference Genome and Annotation\u00a0<\/span><\/strong><\/h1>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row column_structure=&#8221;1_5,3_5,1_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_column type=&#8221;1_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][\/et_pb_column][et_pb_column type=&#8221;3_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_image src=&#8221;https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-content\/uploads\/sites\/342\/2022\/06\/1_HGP_draft_anniversary3.jpeg&#8221; title_text=&#8221;1_HGP_draft_anniversary3&#8243; admin_label=&#8221;Human genome project picture&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; custom_margin=&#8221;-37px|||||&#8221;][\/et_pb_image][\/et_pb_column][et_pb_column type=&#8221;1_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; custom_margin=&#8221;|auto|35px|auto||&#8221; custom_padding=&#8221;28px|||||&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_text admin_label=&#8221;Text&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; text_line_height=&#8221;1.5em&#8221;]<\/p>\n<h2>Overview<\/h2>\n<p>Data Ark is building an accessible reference genome resource folder. The folder covers the most frequently used reference genome (.fasta file) and annotation files (.tdf file).<\/p>\n<p>The Genome files are downloaded from <a href=\"https:\/\/useast.ensembl.org\/index.html\">Ensemble Release 106<\/a>, and the annotation files are downloaded from <a href=\"https:\/\/useast.ensembl.org\/Homo_sapiens\/Info\/Index\">Ensemble,<\/a> <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/refseq\/\">Refseq<\/a>, and <a href=\"https:\/\/www.gencodegenes.org\/human\/\">Gencode<\/a>.<\/p>\n<p>Between Gencode and Ensemble, the gene annotation is the same in both files. The only exception is that the genes which are common to the human chromosome X and Y PAR regions can be found twice in the GENCODE GTF, while they are shown only for chromosome X in the Ensembl file.<\/p>\n<p>In general, the GENCODE\/Ensemble annotations are more comprehensive&#8211;contain more exons, have greater genomic coverage, and capture many more variants than RefSeq in both genome and exome datasets, you can find more information through this paper <a href=\"https:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/1471-2164-16-S8-S2\">link<\/a>.<\/p>\n<p>For the purpose of version control, we have a &#8220;current&#8221; version which is a symlink that always points to the most updated version of the file.<\/p>\n<p>&nbsp;[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row column_structure=&#8221;1_5,3_5,1_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_column type=&#8221;1_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][\/et_pb_column][et_pb_column type=&#8221;3_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_image src=&#8221;https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-content\/uploads\/sites\/342\/2022\/06\/Tree-Structure-final.png&#8221; title_text=&#8221;Tree-Structure-final&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][\/et_pb_image][\/et_pb_column][et_pb_column type=&#8221;1_5&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_text admin_label=&#8221;Text&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; text_line_height=&#8221;1.5em&#8221; hover_enabled=&#8221;0&#8243; sticky_enabled=&#8221;0&#8243;]<\/p>\n<h2>Access<\/h2>\n<p>Effective from January 22, 2024, you must read, agree and sign the <a href=\"https:\/\/dataarkforms.hpc.mssm.edu\/\">Data Use Agreement <\/a>(you must be logged in through the Mount Sinai campus network or secure remote VPN). Access is granted within 24 hours, and on Minerva, you can load module <strong>$ module load dataark <\/strong>to see the path variables.<\/p>\n<h2>Data Ark Data Sets<\/h2>\n<p>Please visit the <a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/data-ark-data-sets\/\">Data Ark Data Set<\/a> webpage to explore other data sets.<\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Scientific Computing and Data \/ Research Data Services \/\u00a0Data Ark: Data Commons \/ Reference GenomeReference Genome and Annotation\u00a0Overview Data Ark is building an accessible reference genome resource folder. The folder covers the most frequently used reference genome (.fasta file) and annotation files (.tdf file). The Genome files are downloaded from Ensemble Release 106, and the [&hellip;]<\/p>\n","protected":false},"author":415,"featured_media":0,"parent":1321,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"class_list":["post-5134","page","type-page","status-publish","hentry"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages\/5134","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/users\/415"}],"replies":[{"embeddable":true,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/comments?post=5134"}],"version-history":[{"count":21,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages\/5134\/revisions"}],"predecessor-version":[{"id":7870,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages\/5134\/revisions\/7870"}],"up":[{"embeddable":true,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages\/1321"}],"wp:attachment":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/media?parent=5134"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}