Scientific Computing and Data / Research Data Services / Data Ark: Data Commons / About Data Ark
About Data Ark
The Data Ark team downloads, organizes and performs quality assurance and quality control on the data. The team also manages the data access process, answers questions on the data, and updates to the latest versions of the data sets. The Data Ark is located on Minerva at /sc/arion/projects/data-ark/. This Mount Sinai data commons is guided by the FAIR principles [1]: making data more findable, accessible, interoperable and reusable. Data Ark includes both public (restricted and unrestricted) and Sinai-generated data sets.
The overarching goal of the Data Ark is to ensure that research data at Mount Sinai are managed, processed and combined in a way that optimizes the power, pace and relevance of our science.
- Power: Scientists typically use only a tiny fraction of available data
- Pace: Users will have rapid access to huge, powerful research data
- Relevance: Our diverse patient population is ideal for testing the generalizability of our results
Data Ark is an initiative led by Associate Professor Paul O’Reilly and Dean for Scientific Computing and Data Patricia Kovatch, and supported by the Department of Genetics and Genomic Sciences and Scientific Computing. An advisory board has been convened to provide guidance and to help Data Ark become sustainable over time.
We are supported by grant UL1TR004419 from the National Center for Advancing Translational Sciences, National Institutes of Health.
Access Data Ark
Effective from January 22, 2024, to access public, Mount Sinai-generated and restricted datasets, you must read, agree and sign the Data Use Agreement (you must be logged in through the Mount Sinai campus network or secure remote VPN). Access is granted within 24 hours, and on Minerva, you can load module $ module load dataark to see the path variables.
The Data Use Agreement is accessible only through the Mount Sinai campus network or secure remote VPN. Click here for the Data Use Agreement and choose the data set that you would like to access from the drop-down list. From here you can follow the link to view and agree to the specific Data Use Agreement. Users will need to login with your Sinai account and password and will be able to choose only one data set at a time.
For more information and for all inquiries relating to the Data Ark, please email: hpchelp@hpc.mssm.edu, or join our Data Ark Slack channel at https://join.slack.com/t/data-ark/signup and signup using your Mount Sinai credentials. You will be able to interact with the researchers and the Data Ark group right away!
Data Ark User Feedback
We have asked Data Ark users for feedback on features and availability of data sets and solicit recommendations for improvement over time. Here are some specific recommendations and comments from Data Ark users:
- 2023 User Comments and Feedback
- 2022 User Comments and Feedback
- 2021 User Comments and Feedback
Data Ark Support Materials
Scientific Computing and Data hosts Data Ark Town Hall and training sessions that are open to current and prospective Data Ark users. Here are the session archives:
- Introduction to Data Ark – October 2024 (Recording)
- Introduction to Data Ark – October 2024 (PowerPoint Slides)
- Introduction to Data Ark – Mount Sinai Data Commons – April 2024 (Recording)
- Introduction to Data Ark – Mount Sinai Data Commons – April 2024 (PowerPoint Slides)
- Data Ark Town Hall – October 2023 (Recording)
- Data Ark Town Hall – October 2023 (PowerPoint Slides)
- Data Ark Town Hall – May 2023 (Recording)
- Data Ark Town Hall – May 2023 (PowerPoint Slides)
- Data Ark Town Hall – December 2022 (PowerPoint Slides)
- Data Ark Town Hall – December 2022 (Recording)
- Data Ark Town Hall – May 2022 (PowerPoint Slides)
- Data Ark Town Hall – May 2022 (Recording)
Data Sets
The Data Ark is located on Minerva and the number, type, and diversity of data sets on the Data Ark are increasing on an ongoing basis.
Onboarding Data Ark Data Sets
PI’s must complete a REDCap form and name expected research groups. Approval process is regulated according to data set size:
- =<1 TB: Data Ark operations team will approve
- >1 TB: must be approved by the Data Ark Advisory Board
Data Retention period: The original data owner will receive usage reports every quarter and will be alerted when other researchers are not using their data sets. If usage is low, then the data sets will be removed from Data Ark. Usage is evaluated annually.
To read more information about the Data Ark Onboarding Policy, including data retention and contacts, please click the downloadable “Data Ark Onboarding/Offboarding Policy” PDF below.
Contact Data Ark Team
The Data Ark team manages the data, data access, and data updates. For all inquiries related to the Data Ark, especially to access or utilize data, please email: hpchelp@hpc.mssm.edu
Data Ark Slack Channel
Join our Data Ark Slack channel at https://join.slack.com/t/data-ark/signup and sign up using your Mount Sinai credentials. You will be able to interact with the researchers right away!
Acknowledge CTSA
Please acknowledge CTSA a fund source for Data Ark in your ensuing publications as the following.
“This work was supported in part through the computational resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences.”