Data Modalities - AI Ready Mount Sinai

Data Modalities

We are continually working on adding new data modalities to AIR·MS as part of our goal of creating a rich, multi-model dataset. At this time, we offer the following datasets on the platform:

Mount Sinai Data Warehouse

The Mount Sinai Data Warehouse (MSDW) dataset uses the OMOP Common Data Model as a CDM. The data comprises clinical data extracted from Mount Sinai’s Epic Caboodle database and other ancillary systems.

We offer both an identifiable and a de-identified version of the MSDW dataset in AIR·MS. In addition, many of the standard OMOP tables contain extension fields which contain data outside of the OMOP standard. Many of these additional attributes are based on data derived directly from EPIC (that is, codes used in EPIC rather than the standardized OMOP codes), or attributes not currently contained in the OMOP standard.

Clinical Notes

Clinical notes in the form of unstructured data (progress notes, telephone encounters, nursing notes, procedures, etc.) extracted from the MSDW OMOP identifiable dataset have been loaded to AIR·MS and enabled for search using SAP HANA’s in-memory full-text search capabilities. This feature empowers the researcher to build patient cohorts based on terms contained in unstructured reports in seconds or even milliseconds! The researcher can further filter based on note type or an array of other clinical attributes.

Computational Pathology

The pathology metadata will aid researchers in the field of Computational Pathology. Researchers will be able to build their patient cohort across data modalities, and subsequently apply quantitative methods for the analysis of digital microscopy slides relating the resulting statistical descriptors to patient outcomes.

Mount Sinai Health System users can access the AIR·MS Sharepoint for further access to current data modalities.

For external users, please email airms-info@mssm.edu for more information.