{"id":1321,"date":"2020-07-07T09:54:20","date_gmt":"2020-07-07T13:54:20","guid":{"rendered":"https:\/\/labs.icahn.mssm.edu\/minervalab\/?page_id=1321"},"modified":"2026-02-06T22:31:52","modified_gmt":"2026-02-07T03:31:52","slug":"data-ark","status":"publish","type":"page","link":"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/","title":{"rendered":"Data Ark: A Data Commons for Mount Sinai"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; fullwidth=&#8221;on&#8221; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_fullwidth_menu menu_id=&#8221;14&#8243; menu_style=&#8221;centered&#8221; fullwidth_menu=&#8221;on&#8221; active_link_color=&#8221;#d80b8c&#8221; dropdown_menu_line_color=&#8221;#221772&#8243; dropdown_menu_text_color=&#8221;#ffffff&#8221; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; menu_font=&#8221;|600|||||||&#8221; menu_text_color=&#8221;#FFFFFF&#8221; menu_font_size=&#8221;16px&#8221; background_color=&#8221;#221f72&#8243; background_layout=&#8221;dark&#8221; custom_padding=&#8221;||||false|false&#8221; sticky_position=&#8221;top&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>[\/et_pb_fullwidth_menu][\/et_pb_section][et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;0px||0px||false|false&#8221; collapsed=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_row _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;||0px||false|false&#8221; collapsed=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text admin_label=&#8221;Breadcrumb&#8221; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/scientific-computing-and-data\/\">Scientific Computing and Data<\/a> \/ <a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/rds\/\">Research Data Services<\/a> \/ Data Ark: Data Commons<\/p>\n<p>[\/et_pb_text][et_pb_text admin_label=&#8221;Data Ark Header&#8221; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; header_font=&#8221;|700|||||||&#8221; header_text_color=&#8221;#221f72&#8243; header_3_text_color=&#8221;#221f72&#8243; global_colors_info=&#8221;{}&#8221;]<\/p>\n<h1>Data Ark: Data Commons for Mount Sinai<\/h1>\n<h3>Increasing the power, pace and relevance of our science<\/h3>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section][et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_row column_structure=&#8221;3_4,1_4&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;0px|||||&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;3_4&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_image src=&#8221;https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-content\/uploads\/sites\/342\/2021\/02\/Data_Ark_Final.jpg&#8221; title_text=&#8221;Data_Ark_Final&#8221; admin_label=&#8221;Image of the Ark&#8221; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;0px||||false|false&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>[\/et_pb_image][et_pb_text _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; custom_margin=&#8221;0px||||false|false&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><em>Image by Jessica Johnson \u00a9. See www.jessicajohnsonart.com<\/em><\/p>\n<p>[\/et_pb_text][\/et_pb_column][et_pb_column type=&#8221;1_4&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text admin_label=&#8221;Data Ark Goal text \/ Data Sets header&#8221; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; text_line_height=&#8221;1.5em&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>The overarching goal of the Data Ark is to ensure that research data at Mount Sinai are managed, processed and combined in a way that optimizes the power, pace and relevance of our science.<\/p>\n<ul>\n<li>Power: Scientists typically use only a tiny fraction of available data<\/li>\n<li>Pace: Users will have rapid access to huge, powerful research data<\/li>\n<li>Relevance: Our diverse patient population is ideal for testing the generalizability of our results<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2><span style=\"color: #221f72\">Data Ark Data Sets (5\/7\/25)<\/span><\/h2>\n<p>The Data Ark is located on Minerva and the number, type, and diversity of data sets on the Data Ark will increase substantially in the coming months. The Data Ark consists of public data sets, Mount Sinai generated data sets and School-Acquired data sets. There are also some data supplements provided via Data Ark.<\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row column_structure=&#8221;1_3,1_3,1_3&#8243; use_custom_gutter=&#8221;on&#8221; gutter_width=&#8221;2&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; background_color=&#8221;RGBA(0,0,0,0)&#8221; custom_margin=&#8221;||||false|false&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;1_3&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><strong>Public Datasets (Unrestricted)<\/strong><\/p>\n<ul>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/1000-genomes\/\">1,000 Genomes Project<\/a><\/li>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/blast\/\">BLAST<\/a><\/li>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/eqtlgen\/\">eQTLGen<\/a><\/li>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/genebass\/\">Genebass<\/a><\/li>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/gnomad\/\">Genome Aggregation Database (gnomAD)<\/a><\/li>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/gwas-summary-statistics\/\">Genome-wide Association Study (GWAS) Summary Stats<\/a><\/li>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/gtex\/\">Genotype-Tissue Expression (GTEx) Project<\/a><\/li>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/ldscore\/\">Linkage Disequilibrium (LD) Score Regression Data<\/a><\/li>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/reference-genome\/\">Reference Genome<\/a><\/li>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/tcga\/\">The Cancer Genome Atlas (TCGA)<\/a><\/li>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/ukbb_ld\/\">UK Biobank (UKBB)-Linkage Disequilibrium (LD)<\/a><\/li>\n<\/ul>\n<p>[\/et_pb_text][\/et_pb_column][et_pb_column type=&#8221;1_3&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; hover_enabled=&#8221;0&#8243; global_colors_info=&#8221;{}&#8221; sticky_enabled=&#8221;0&#8243;]<\/p>\n<p><strong>Mount Sinai Generated Datasets (Restricted)<\/strong><\/p>\n<ul>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/digital-pathology-slides\/\">De-identified Digital Pathology Slides<\/a><\/li>\n<li><a style=\"font-size: 14px\" href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/lbp\/\">Living Brain Project<\/a><\/li>\n<li><a style=\"font-size: 14px\" href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/mscic-covid-19-biobank\/\">Mount Sinai COVID-19 Biobank<\/a><\/li>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/mount-sinai-data-warehouse-covid-19-electronic-health-record-ehr-data-set\/\">Mount Sinai Data Warehouse (MSDW) De-identified COVID-19 Electronic Health Record (EHR) Data<\/a><\/li>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/mount-sinai-data-warehouse-msdw-de-identified-omop-data-set\/\">Mount Sinai Data Warehouse (MSDW) De-identified Observational Medical Outcomes Partnership (OMOP) Data<\/a><\/li>\n<li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/stop-covid-nyc-cohort\/\">STOP COVID NYC Cohort<\/a><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>[\/et_pb_text][\/et_pb_column][et_pb_column type=&#8221;1_3&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; custom_margin=&#8221;0px||||false|false&#8221; custom_padding=&#8221;||0px|||&#8221; collapsed=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text admin_label=&#8221;Data Ark data sets&#8221; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; text_line_height=&#8221;1.5em&#8221; header_font=&#8221;|700|||||||&#8221; header_text_color=&#8221;#221f72&#8243; header_2_text_color=&#8221;#221f72&#8243; header_2_font_size=&#8221;22px&#8221; vertical_offset_tablet=&#8221;0&#8243; horizontal_offset_tablet=&#8221;0&#8243; custom_padding=&#8221;||||true&#8221; z_index_tablet=&#8221;0&#8243; text_text_shadow_horizontal_length_tablet=&#8221;0px&#8221; text_text_shadow_vertical_length_tablet=&#8221;0px&#8221; text_text_shadow_blur_strength_tablet=&#8221;1px&#8221; link_text_shadow_horizontal_length_tablet=&#8221;0px&#8221; link_text_shadow_vertical_length_tablet=&#8221;0px&#8221; link_text_shadow_blur_strength_tablet=&#8221;1px&#8221; ul_text_shadow_horizontal_length_tablet=&#8221;0px&#8221; ul_text_shadow_vertical_length_tablet=&#8221;0px&#8221; ul_text_shadow_blur_strength_tablet=&#8221;1px&#8221; ol_text_shadow_horizontal_length_tablet=&#8221;0px&#8221; ol_text_shadow_vertical_length_tablet=&#8221;0px&#8221; ol_text_shadow_blur_strength_tablet=&#8221;1px&#8221; quote_text_shadow_horizontal_length_tablet=&#8221;0px&#8221; quote_text_shadow_vertical_length_tablet=&#8221;0px&#8221; quote_text_shadow_blur_strength_tablet=&#8221;1px&#8221; header_text_shadow_horizontal_length_tablet=&#8221;0px&#8221; header_text_shadow_vertical_length_tablet=&#8221;0px&#8221; header_text_shadow_blur_strength_tablet=&#8221;1px&#8221; header_2_text_shadow_horizontal_length_tablet=&#8221;0px&#8221; header_2_text_shadow_vertical_length_tablet=&#8221;0px&#8221; header_2_text_shadow_blur_strength_tablet=&#8221;1px&#8221; header_3_text_shadow_horizontal_length_tablet=&#8221;0px&#8221; header_3_text_shadow_vertical_length_tablet=&#8221;0px&#8221; header_3_text_shadow_blur_strength_tablet=&#8221;1px&#8221; header_4_text_shadow_horizontal_length_tablet=&#8221;0px&#8221; header_4_text_shadow_vertical_length_tablet=&#8221;0px&#8221; header_4_text_shadow_blur_strength_tablet=&#8221;1px&#8221; header_5_text_shadow_horizontal_length_tablet=&#8221;0px&#8221; header_5_text_shadow_vertical_length_tablet=&#8221;0px&#8221; header_5_text_shadow_blur_strength_tablet=&#8221;1px&#8221; header_6_text_shadow_horizontal_length_tablet=&#8221;0px&#8221; header_6_text_shadow_vertical_length_tablet=&#8221;0px&#8221; header_6_text_shadow_blur_strength_tablet=&#8221;1px&#8221; box_shadow_horizontal_tablet=&#8221;0px&#8221; box_shadow_vertical_tablet=&#8221;0px&#8221; box_shadow_blur_tablet=&#8221;40px&#8221; box_shadow_spread_tablet=&#8221;0px&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>To find more information about the <a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/kpmp-data-ark\/\">Kidney Precision Medicine Project (KPMP), click here<\/a>.<\/p>\n<p>Data Ark also provides resources on helpful links to external data sets.<\/p>\n<p><strong>Helpful External Data Sets<\/strong>: <a href=\"https:\/\/www.researchallofus.org\/\">All of Us<\/a><\/p>\n<p>To see more detail about each data set, including supplemental data sets, <a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/data-ark-data-sets\/\">click here<\/a>.<\/p>\n<p>\u00a0<\/p>\n<p>[\/et_pb_text][et_pb_button button_url=&#8221;https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/data-ark-data-sets\/&#8221; button_text=&#8221;More Data Sets&#8221; button_alignment=&#8221;center&#8221; admin_label=&#8221;More Data Sets button&#8221; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; custom_button=&#8221;on&#8221; button_bg_use_color_gradient=&#8221;on&#8221; button_bg_color_gradient_direction=&#8221;224deg&#8221; button_bg_color_gradient_stops=&#8221;#00aeef 0%|#221f72 100%&#8221; button_bg_color_gradient_start=&#8221;#00aeef&#8221; button_bg_color_gradient_end=&#8221;#221f72&#8243; button_border_radius=&#8221;26px&#8221; button_font=&#8221;|600||on|||||&#8221; button_use_icon=&#8221;off&#8221; custom_margin=&#8221;20px||20px||false|false&#8221; custom_padding=&#8221;15px|30px|15px|30px|false|false&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>[\/et_pb_button][et_pb_text admin_label=&#8221;How to access + submit a ticket&#8221; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; text_line_height=&#8221;1.5em&#8221; header_font=&#8221;|700|||||||&#8221; header_text_color=&#8221;#221f72&#8243; header_2_text_color=&#8221;#221f72&#8243; header_2_font_size=&#8221;22px&#8221; custom_padding=&#8221;||||true&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<h2>How can I access the data sets?<\/h2>\n<p>&nbsp;<\/p>\n<p>Effective from January 22, 2024, to access public, Mount Sinai-generated and restricted datasets, you must read, agree and sign the <a href=\"https:\/\/dataarkforms.hpc.mssm.edu\/\">Data Use Agreement <\/a>(you must be logged in through the Mount Sinai campus network or secure remote VPN). Access is granted within 24 hours, and on Minerva, you can load module <strong>$ module load dataark <\/strong>to see the path variables.<\/p>\n<p>If you haven&#8217;t used Minerva before, please follow this <a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/request-an-account\/\">link<\/a> to register and <a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/minerva-quick-start\/\">here<\/a> for quick start guidelines.<\/p>\n<p>To access the <strong>Data Use Agreement <\/strong>page,\u00a0<span>Minerva account is required. Please following:<\/span><\/p>\n<ol>\n<li><a href=\"https:\/\/dataarkforms.hpc.mssm.edu\/\"><span>Click here<\/span><\/a> to access the Data Ark DUA Forms. The Mount Sinai campus network is needed or school VPN if off-campus<\/li>\n<li>Choose the data set that you would like to access from the drop-down list<\/li>\n<li><span>\u00a0Please input your Minerva username and password on the next prompt window (no VIP token needed).<\/span><\/li>\n<li>Follow the link to view and agree to the specific Data Use Agreement.<\/li>\n<li>You will be able to only choose one data set at a time.<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<h2>Contact the Team or Submit a Ticket<\/h2>\n<p>&nbsp;<\/p>\n<p><strong>We need your help to keep the Data Ark afloat: please report every grant submission, award and publication enabled by the Data Ark by emailing us at <a href=\"mailto:hpchelp@hpc.mssm.edu\">hpchelp@hpc.mssm.edu<\/a> with the info. Thanks so much for letting us know how the Data Ark has been useful!<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p><strong>For more information <\/strong>and for all inquiries relating to the Data Ark, please email: <a href=\"mailto:hpchelp@hpc.mssm.edu\">hpchelp@hpc.mssm.edu<\/a>, or join our<strong> Data Ark Slack channel at <a href=\"https:\/\/join.slack.com\/t\/data-ark\/signup\"><span style=\"text-decoration: underline\">https:\/\/join.slack.com\/t\/data-ark\/signup <\/span><\/a><\/strong> and signup using your Mount Sinai credentials. You will be able to interact with the researchers and the Data Ark group right away!<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>[\/et_pb_text][et_pb_button button_url=&#8221;mailto:data-ark-team@lists.mssm.edu&#8221; button_text=&#8221;Submit a Ticket&#8221; button_alignment=&#8221;center&#8221; admin_label=&#8221;Submit a Ticket Button&#8221; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; custom_button=&#8221;on&#8221; button_bg_use_color_gradient=&#8221;on&#8221; button_bg_color_gradient_direction=&#8221;224deg&#8221; button_bg_color_gradient_stops=&#8221;#00aeef 0%|#221f72 100%&#8221; button_bg_color_gradient_start=&#8221;#00aeef&#8221; button_bg_color_gradient_end=&#8221;#221f72&#8243; button_border_radius=&#8221;26px&#8221; button_font=&#8221;|600||on|||||&#8221; button_use_icon=&#8221;off&#8221; custom_margin=&#8221;20px||20px||false|false&#8221; custom_padding=&#8221;15px|30px|15px|30px|false|false&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>[\/et_pb_button][et_pb_text admin_label=&#8221;What is Data Ark \/ Why Use \/ About Us&#8221; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; text_line_height=&#8221;1.5em&#8221; header_font=&#8221;|700|||||||&#8221; header_text_color=&#8221;#221f72&#8243; header_2_text_color=&#8221;#221f72&#8243; header_2_font_size=&#8221;22px&#8221; custom_padding=&#8221;||||true&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>&nbsp;<\/p>\n<h2><span style=\"color: #221f72\">What is the Data Ark?<u> <\/u><\/span><\/h2>\n<ul>\n<li><em>Space on the Minerva Supercomputer<\/em> to host all frequent-use research data sets<\/li>\n<li><em>A team<\/em> of data scientists\/engineers to manage the resource, process data, simplify access process<\/li>\n<li><em>An opportunity <\/em>for a step-change in the power and pace of Sinai research<\/li>\n<\/ul>\n<p>This Mount Sinai data commons is guided by the FAIR principles [1]: making data more <em>findable<\/em>,<em> accessible<\/em>,<em> interoperable and reusable<\/em>. Data Ark includes both public (restricted and unrestricted) and Sinai-generated data sets.<\/p>\n<p>The Data Ark team downloads, organizes and performs quality assurance and quality control on the data. The team also manages the data access process, answers questions on the data, and updates to the latest versions of the data sets. The Data Ark is located on Minerva at \/sc\/arion\/projects\/data-ark\/.<\/p>\n<p>&nbsp;<\/p>\n<h2><span style=\"color: #221f72\">Why use the Data Ark?<\/span><\/h2>\n<ul>\n<li>Increasing your sample size reduces false-positives and boosts statistical power<\/li>\n<li>Analyzing new data sources allows testing the generalizability of your results and enables you to ask new scientific questions<\/li>\n<li>It will save you time otherwise spent locating, processing and correcting data<\/li>\n<li>The data quality is extremely high due to its processing by the dedicated Data Ark team and its repeated use by many Sinai investigators able to detect and correct data errors<\/li>\n<li>It reduces wasteful duplication of data sets<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2><span style=\"color: #221f72\">Why share your data?<\/span><\/h2>\n<ul>\n<li>Data quality will be maximized by professional processing and repeat use<\/li>\n<li>Your lab will have more time for science rather than processing data<\/li>\n<li>The profile of your data set will be raised<\/li>\n<li>Expanded opportunities for citations and collaboration<\/li>\n<li>New ways of using your data will be highlighted<\/li>\n<li>Being a good data-sharer will be credited in faculty evaluations and by the appointments and promotions committee<\/li>\n<\/ul>\n<p><strong>Diverse research projects<\/strong> performed across Mount Sinai on <u>exactly the same large data<\/u> resource will foster effective collaboration and has the potential to dramatically increase the pace of our scientific and medical advances.<\/p>\n<p>&nbsp;<\/p>\n<h2><span style=\"color: #221f72\">About Us<\/span><\/h2>\n<p>Data Ark is an initiative led by Associate Professor Paul O\u2019Reilly and Dean for Scientific Computing and Data Patricia Kovatch, and supported by the Department of Genetics and Genomic Sciences and Scientific Computing and Data. An advisory board has been convened to provide guidance and to help Data Ark become sustainable over time.<\/p>\n<p>&nbsp;<\/p>\n<h1>Ackowledge CTSA<\/h1>\n<p>Please acknowledge CTSA a fund source for Data Ark in your ensuing publications as the following.<\/p>\n<p><strong>&#8220;This work was supported in part through the computational resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences.&#8221;<\/strong><\/p>\n<p>To associate the CTSA grant UL1TR004419 to an existing publication, please follow <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/books\/NBK53595\/#mybibliography.Associating_Funding_to_yo\" data-et-has-event-already=\"true\">these instructions<\/a> from the NIH (see the section \u201cAssociating Funding to your Publications\u201d).<\/p>\n<p>[\/et_pb_text][et_pb_button button_url=&#8221;https:\/\/labs.icahn.mssm.edu\/minervalab\/resources\/data-ark\/about-data-ark\/&#8221; button_text=&#8221;More About Data Ark&#8221; button_alignment=&#8221;center&#8221; admin_label=&#8221;More About Data Ark button&#8221; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; custom_button=&#8221;on&#8221; button_bg_use_color_gradient=&#8221;on&#8221; button_bg_color_gradient_direction=&#8221;224deg&#8221; button_bg_color_gradient_stops=&#8221;#00aeef 0%|#221f72 100%&#8221; button_bg_color_gradient_start=&#8221;#00aeef&#8221; button_bg_color_gradient_end=&#8221;#221f72&#8243; button_border_radius=&#8221;26px&#8221; button_font=&#8221;|600||on|||||&#8221; button_use_icon=&#8221;off&#8221; custom_margin=&#8221;20px||20px||false|false&#8221; custom_padding=&#8221;15px|30px|15px|30px|false|false&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p>[\/et_pb_button][et_pb_text admin_label=&#8221;Citation text footer&#8221; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; text_line_height=&#8221;1.5em&#8221; header_font=&#8221;|700|||||||&#8221; header_text_color=&#8221;#221f72&#8243; header_2_text_color=&#8221;#221f72&#8243; header_2_font_size=&#8221;22px&#8221; custom_padding=&#8221;||||true&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><span style=\"color: #808080\"><a style=\"color: #808080\" href=\"#_ednref1\" name=\"_edn1\">&#8212;&#8212;&#8212;<\/a>&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #808080\"><a style=\"color: #808080\" href=\"#_ednref1\" name=\"_edn1\">[1]<\/a> Wilkinson, M., Dumontier, M., Aalbersberg, I.\u00a0<em>et al.<\/em>\u00a0The FAIR Guiding Principles for scientific data management and stewardship.\u00a0<em>Sci Data<\/em>\u00a0<strong>3,\u00a0<\/strong>160018 (2016). https:\/\/doi.org\/10.1038\/sdata.2016.18<\/span><\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Scientific Computing and Data \/ Research Data Services \/ Data Ark: Data Commons Data Ark: Data Commons for Mount Sinai Increasing the power, pace and relevance of our science Image by Jessica Johnson \u00a9. See www.jessicajohnsonart.com The overarching goal of the Data Ark is to ensure that research data at Mount Sinai are managed, processed [&hellip;]<\/p>\n","protected":false},"author":457,"featured_media":0,"parent":48,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"<div><div><div><h3><i>Increasing the power, pace and relevance of our science<\/i><\/h3><\/div><\/div><p>\u00a0<\/p><p style=\"text-align: right;\"><img class=\"aligncenter size-full wp-image-1591\" src=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-content\/uploads\/sites\/342\/2021\/02\/Data_Ark_Final.jpg\" alt=\"\" width=\"1891\" height=\"1182\" \/><em>Image by Jessica Johnson \u00a9. See www.jessicajohnsonart.com<\/em><\/p><p>The overarching goal of the <strong>Data Ark<\/strong> is to ensure that research data at Mount Sinai are managed, processed and combined in a way that optimizes the power, pace and relevance of our science.<\/p><p><strong>Power: <\/strong>Scientists typically use only a tiny fraction of available data<br \/><strong>Pace: <\/strong>Users will have rapid access to huge, powerful research data<br \/><strong>Relevance: <\/strong>Our diverse patient population is ideal for testing the generalizability of our results<\/p><\/div><p>\u00a0<\/p><div><h2><span style=\"color: #221f72;\">Data Ark Data Sets, version 1 (3\/1\/21):<\/span><\/h2><p>As of launch, the Data Ark consists of the seven data sets listed below (click links for dedicated data set pages). We plan to expand the number, type and diversity of data sets over the next year. To help us prioritize the next data sets, we will issue a survey in March 2021. We look forward to your feedback through the survey or by contacting us directly at <a href=\"mailto:data-ark-team@lists.mssm.edu\">data-ark-team@lists.mssm.edu<\/a><\/p><\/div><p><strong><br \/>Public data sets (unrestricted):<\/strong><\/p><ul><li><span style=\"text-decoration: underline;\"><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/1000-genomes\/\"><strong>1,000 Genomes Project<\/strong><\/a> <\/span>- Whole Genome Sequencing (WGS) data on ~1,000 individuals of mixed ancestry<\/li><li><strong><span style=\"text-decoration: underline;\"><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/gtex\/\">GTEx<\/a><\/span><\/strong> - Gene expression data on hundreds of individuals across ~50 tissues<\/li><li><strong><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/gwas-summary-statistics\/\"><u>GWAS Summary Stats<\/u> <\/a><\/strong>- Genome Wide Association Studies (GWAS) results in standardized format across 1,000s of outcomes<\/li><\/ul><p><strong>Public data sets (restricted): <\/strong><\/p><ul><li><span style=\"text-decoration: underline;\"><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/uk-biobank\/\"><b>UK Biobank <\/b><\/a><\/span>- Genetic data (genotype\/WES) from the UK Biobank data on 500,000 individuals.<\/li><li><strong><u>TCGA data<\/u><\/strong> \u2013 COMING SOON!!<\/li><\/ul><p><strong>Mount Sinai generated data (unrestricted): <\/strong><\/p><ul><li><strong><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/stop-covid-nyc-cohort\/\"><span style=\"text-decoration: underline;\">STOP COVID NYC Cohort\u00a0<\/span><\/a><\/strong>- symptom and behavior on COVID-19 on ~50,000 New York City residents surveyed via phone apps in April 2020<\/li><\/ul><p><strong>Mount Sinai generated data (restricted):<\/strong><\/p><ul><li><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/mount-sinai-data-warehouse-covid-19-electronic-health-record-ehr-data-set\/\"><strong><span style=\"text-decoration: underline;\">Mount Sinai Data Warehouse COVID-19 Electronic Health Record (EHR) Data Set<\/span><\/strong><\/a>\u00a0- de-identified clinical data on patients from Caboodle with or suspected of COVID-19 containing 350 data elements and updated daily<\/li><li><strong><a href=\"https:\/\/labs.icahn.mssm.edu\/minervalab\/mscic-covid-19-biobank\/\"><u>The Mount Sinai COVID-19 Biobank <\/u><\/a><\/strong>- blood samples from hundreds of COVID-19 patients hospitalized at Mount Sinai, with genotype\/WGS data available.<\/li><\/ul><p>\u00a0<\/p><div><h2><span style=\"color: #221f72;\">How can I access the data sets?<\/span><\/h2><p>You must read, agree and sign the <strong><em>Data Use Agreement <\/em><\/strong>specific to each data set that you want to access. Once the agreement has been submitted, as well as any evidence of approved permission for public restricted-use data, the Data Ark team will grant access within two working days. You will be notified by email that access has been granted.<\/p><p>To access the list of available data sets, you must be logged in through the Mount Sinai campus network or secure remote VPN. Please <a href=\"https:\/\/dataarkforms.hpc.mssm.edu\/\"><span style=\"text-decoration: underline;\"><strong>click here<\/strong><\/span><\/a> and choose the data set that you would like to access from the drop-down list. From here you can follow the link to view and agree to the specific data use agreement. You will need to login with your Sinai account and password. You will be able to only choose one data set at a time.<\/p><p><strong>Help us?<\/strong><span style=\"font-size: small;\"><br \/><\/span><strong>We need your help to keep the Data Ark afloat: please report every grant submission, award and publication enabled by the Data Ark by emailing us at data-ark-team@mssm.edu with the info. Thanks so much for letting us know how the Data Ark has been useful!<\/strong><\/p><p><strong>For more information<br \/><\/strong>For all inquiries relating to the Data Ark please email: <a href=\"mailto:data-ark-team@lists.mssm.edu\">data-ark-team@lists.mssm.edu<\/a><\/p><p>\u00a0<\/p><h2><span style=\"color: #221f72;\">What is the Data Ark?<strong><em><u><br \/><\/u><\/em><\/strong><\/span><\/h2><ul><li><em>Space on Minerva<\/em> to host all frequent-use research data sets<\/li><li><em>A team<\/em> of data scientists\/engineers to manage the resource, process data, simplify access process<\/li><li><em>An opportunity <\/em>for a step-change in the power and pace of Sinai research<\/li><\/ul><p>This Mount Sinai data commons is guided by the FAIR principles [1]: making data more <em>findable<\/em>,<em> accessible<\/em>,<em> interoperable and reusable<\/em>. Data Ark includes both public (restricted and unrestricted) and Sinai-generated data sets.<\/p><p>The Data Ark team downloads, organizes and performs quality assurance and quality control on the data. The team also manages the data access process, answers questions on the data, and updates to the latest versions of the data sets. The Data Ark is located on Minerva at \/sc\/arion\/projects\/data-ark\/.<\/p><p>\u00a0<\/p><h2><span style=\"color: #221f72;\">Why use the Data Ark?<\/span><\/h2><ul><li>Increasing your sample size reduces false-positives and boosts statistical power<\/li><li>Analyzing new data sources allows testing the generalizability of your results and enables you to ask new scientific questions<\/li><li>It will save you time otherwise spent locating, processing and correcting data<\/li><li>The data quality is extremely high due to its processing by the dedicated Data Ark team and its repeated use by many Sinai investigators able to detect and correct data errors<\/li><li>It reduces wasteful duplication of data sets<\/li><\/ul><p>\u00a0<\/p><h2><span style=\"color: #221f72;\">Why share your data?<\/span><\/h2><ul><li>Data quality will be maximized by professional processing and repeat use<\/li><li>Your lab will have more time for science rather than processing data<\/li><li>The profile of your data set will be raised<\/li><li>Expanded opportunities for citations and collaboration<\/li><li>New ways of using your data will be highlighted<\/li><li>Being a good data-sharer will be credited in faculty evaluations and by the appointments and promotions committee<\/li><\/ul><p><strong>Diverse research projects<\/strong> performed across Mount Sinai on <u>exactly the same large data<\/u> resource will foster effective collaboration and has the potential to dramatically increase the pace of our scientific and medical advances.<\/p><\/div><p>\u00a0<\/p><p>\u00a0<\/p><h2><span style=\"color: #221f72;\">About Us<\/span><\/h2><p>Data Ark is an initiative led by Associate Professor Paul O\u2019Reilly and Senior Associate Dean for Computing Patricia Kovatch, and supported by the Department of Genetics and Genomic Sciences and Scientific Computing. An advisory board has been convened to provide guidance and to help Data Ark become sustainable over time.<\/p><p>\u00a0<\/p><p>\u00a0<\/p><h6><span style=\"color: #808080;\"><a style=\"color: #808080;\" href=\"#_ednref1\" name=\"_edn1\">---------<\/a>-----------------<\/span><\/h6><p>\u00a0<\/p><h6><span style=\"color: #808080;\"><a style=\"color: #808080;\" href=\"#_ednref1\" name=\"_edn1\">[1]<\/a> Wilkinson, M., Dumontier, M., Aalbersberg, I.\u00a0<em>et al.<\/em>\u00a0The FAIR Guiding Principles for scientific data management and stewardship.\u00a0<em>Sci Data<\/em>\u00a0<strong>3,\u00a0<\/strong>160018 (2016). https:\/\/doi.org\/10.1038\/sdata.2016.18<\/span><\/h6>","_et_gb_content_width":"","footnotes":""},"class_list":["post-1321","page","type-page","status-publish","hentry"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages\/1321","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/users\/457"}],"replies":[{"embeddable":true,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/comments?post=1321"}],"version-history":[{"count":231,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages\/1321\/revisions"}],"predecessor-version":[{"id":13219,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages\/1321\/revisions\/13219"}],"up":[{"embeddable":true,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/pages\/48"}],"wp:attachment":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/media?parent=1321"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}