Scientific Computing and Data / Research Data Services / Data Ark: Data Commons / About Data Ark

About Data Ark

The Data Ark team downloads, organizes and performs quality assurance and quality control on the data. The team also manages the data access process, answers questions on the data, and updates to the latest versions of the data sets. The Data Ark is located on Minerva at /sc/arion/projects/data-ark/. This Mount Sinai data commons is guided by the FAIR principles [1]: making data more findable, accessible, interoperable and reusable. Data Ark includes both public (restricted and unrestricted) and Sinai-generated data sets.

The overarching goal of the Data Ark is to ensure that research data at Mount Sinai are managed, processed and combined in a way that optimizes the power, pace and relevance of our science.

Power: Scientists typically use only a tiny fraction of available data
Pace: Users will have rapid access to huge, powerful research data
Relevance: Our diverse patient population is ideal for testing the generalizability of our results

Data Ark is an initiative led by Associate Professor Paul O’Reilly and Dean for Scientific Computing and Data Patricia Kovatch, and supported by the Department of Genetics and Genomic Sciences and Scientific Computing. An advisory board has been convened to provide guidance and to help Data Ark become sustainable over time.

We are supported by grant UL1TR004419 from the National Center for Advancing Translational Sciences, National Institutes of Health.

Access Data Ark

Effective from January 22, 2024, to access public, Mount Sinai-generated and restricted datasets, you must read, agree and sign the Data Use Agreement (you must be logged in through the Mount Sinai campus network or secure remote VPN). Access is granted within 24 hours, and on Minerva, you can load module $ module load dataark to see the path variables.

The Data Use Agreement is accessible only through the Mount Sinai campus network or secure remote VPN. Click here for the Data Use Agreement and choose the data set that you would like to access from the drop-down list. From here you can follow the link to view and agree to the specific Data Use Agreement. Users will need to login with your Sinai account and password and will be able to choose only one data set at a time.

For more information and for all inquiries relating to the Data Ark, please email: hpchelp@hpc.mssm.edu, or join our Data Ark Slack channel at https://join.slack.com/t/data-ark/signup and signup using your Mount Sinai credentials. You will be able to interact with the researchers and the Data Ark group right away!

Data Ark User Feedback

We have asked Data Ark users for feedback on features and availability of data sets and solicit recommendations for improvement over time. Here are some specific recommendations and comments from Data Ark users:

Data Ark Support Materials

Scientific Computing and Data hosts Data Ark Town Hall and training sessions that are open to current and prospective Data Ark users. Here are the session archives:

Data Sets

The Data Ark is located on Minerva and the number, type, and diversity of data sets on the Data Ark are increasing on an ongoing basis.

Click here for data sets

Onboarding Data Ark Data Sets

PI’s must complete a REDCap form and name expected research groups. Approval process is regulated according to data set size:

=<1 TB: Data Ark operations team will approve
>1 TB: must be approved by the Data Ark Advisory Board

Data Retention period: The original data owner will receive usage reports every quarter and will be alerted when other researchers are not using their data sets. If usage is low, then the data sets will be removed from Data Ark. Usage is evaluated annually.

To read more information about the Data Ark Onboarding Policy, including data retention and contacts, please click the downloadable “Data Ark Onboarding/Offboarding Policy” PDF below.

Data Ark Onboarding/Offboarding Policy (PDF)

Contact Data Ark Team

The Data Ark team manages the data, data access, and data updates. For all inquiries related to the Data Ark, especially to access or utilize data, please email: hpchelp@hpc.mssm.edu

Data Ark Slack Channel

Join our Data Ark Slack channel at https://join.slack.com/t/data-ark/signup and sign up using your Mount Sinai credentials. You will be able to interact with the researchers right away!

Acknowledge CTSA

Please acknowledge CTSA a fund source for Data Ark in your ensuing publications as the following.

“This work was supported in part through the computational resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences.”