Scientific Computing and Data / Mount Sinai Data Warehouse / Resources and Services for Oncology

Scientific Computing and Data Resources and Services for Oncology


Computational and data science ecosystem is available for Oncology researchers to gain insights to further their research. There are HP resources, EHR cohort query tools, data resources and electronic data capture systems available. Expertise is available to support researchers with tools, data access, and consultations.


High Performance Computing

  • Minerva Supercomputer
    The High Performance Computing resource, Minerva, arrived in 2012 and is continuously upgraded, most recently in 2022. Minerva utilizes 2 petaflops of compute power, and our computational and data scientists are experts at the intersection of several scientific domains, and partner directly with researchers to efficiently and effectively leverage our ecosystem. Minerva has contributed to over 1,400 peer-reviewed publications in ten years
  • NIH Strides Access
    NIH STRIDES provides access to cloud resources including Google Cloud, Amazon Web Services (AWS) and Microsoft Azure to help advance biomedical research. This initiative enables access to rich datasets and advanced computational infrastructure, tools, and services.


Electronic Health Records Cohort Query Tools

  • Leaf
    Leaf is a drag-and-drop, self-service query tool application used to search de-identified electronic health record data in the Mount Sinai Data Warehouse. Pre-defined searchable patient cohorts are available for researchers. These include the following:
    • Cancer Institute Biorepository – participants with specimens (tumor tissue and fluids) from Mount Sinai-affiliated hospitals in the Mount Sinai Cancer Institute Biorepository (MSCIB) Request for access to CIB specimen data on Data Ark can be submitted here.
    • Cancer Patient Cohort – a “pre-calculated” cohort comprising all Mount Sinai patients with a diagnosis in Epic that maps to a cancer-related ICD-9, ICD-10, or SNOMED code.
    • Imaging Research Warehouse 1.0 – the imaging research data warehouse developed by the BioMedical Engineering and Imaging Institute (BMEII), which contains ~525K patient image studies dating back to 2017. Images for patients identified from this cohort may be requested from the BMEII team.
      More information on Leaf can be found here.
    ATLAS is a free, publicly available web-based, open-source, self-service query application developed by the OHDSI community. The ATLAS platform utilizes the OMOP Common Data Model to query and ultimately analyze Epic data from the Mount Sinai Health System. ATLAS supports database exploration, standardized vocabulary browsing, cohort definition, and population-level analysis.
    More information on ATLAS can be found here.

Data Resources

  • Data Ark: A Mount Sinai Data Commons
    The Data Ark consists of public data sets, Mount Sinai generated data sets and School-Acquired data sets. New data sets of interest to the research community are continuously being added.
    • The Cancer Genome Atlas (TCGA) Database
      The Minerva supercomputer hosts all the biospecimen, clinical, RNA-seq counts, WXS-Mutation Annotation Format (MAF), and the TCGA data sets from cBioPortal in the Data Ark data commons.
  • Mount Sinai Data Warehouse
    Mount Sinai Data Warehouse contains all Mount Sinai Health System patient data from the Epic electronic health record system. It contains over 11 million patient records and data for over 80 million patient encounters. Data is extracted from Epic EHR and is stored in the OMOP common data model.
  • MSDW Oncology Data Mart – 255,909 distinct patients
    Clinical EHR data of patients with cancer diagnosis in Mount Sinai’s instance of Epic. De-identified data can be searched via Leaf; alternatively, users can request access to de-identified or identified data mart with IRB approval.
  • Cancer Registry – 194,564 distinct patients
    Computerized database relevant to the diagnosis, treatment, and lifetime follow-up of cancer patients cared for by the Mount Sinai Health System. Researchers can request access to data with appropriate IRB approval. All data requests should be submitted to the MSDW ticket system alongside all relevant documentation.
  • IBM® MarketScan®
    MarketScan® Research Databases from IBM® provides one of the longest-running and largest collections of proprietary de-identified claims data for privately and publicly insured people in the U.S.
  • Dashboards
    Custom informational dashboards through Tableau visualize essential data and metrics from a pre-defined data set in real time, offering rapid insights into trends and growth.


Electronic Data Capture Tools

  • Electronic Research Application Portal (eRAP)
    eRAP is a 21 CFR part 11 compliant web-based data capture system. eRAP provides custom database development for longitudinal single and multisite studies.
  • REDCap
    REDCap is a web-based HIPAA-compliant electronic data capture system for building online surveys, databases and eConsenting. REDCap is a self-service tool, but Scientific Computing and Data staff provides support and custom development. With the REDCap clinical data pull feature, certain patient data can be imported from Epic in real time.

Digital Concierge



Oncology Quick Links