Mount Sinai Data Warehouse (MSDW) de-identified OMOP Data Set


The Mount Sinai Data Warehouse (MSDW) collects clinical and operational data for use in clinical and translational research, as well as quality improvement initiatives.

The MSDW de-identified OMOP (Observational Medical Outcomes Partnership) Data Set is the entire copy for MSDW OMOP data mart extracted from their SQL database.  It contains over 11 million patient records and over 87 million patient encounters (as of September 2021, see Data Sources for more details).

MSDW ecosystem uses the Observational Health Data Sciences and Informatics (OHDSI) collaborative’s OMOP Common Data Model to facilitate optimal data sharing and interoperability both internally (at Mount Sinai) and externally with research partners. The data collected by MSDW are extracted from Mount Sinai’s Epic Caboodle database and other ancillary systems, which includes demographics, encounters, labs, conditions, diagnoses, medications and more. The data is transformed to the OMOP Common Data Model (CDM) format and loaded to the MSDW database. Reference OHDSI’s data dictionary for OMOP CDM.


To use these data, you must read, agree and sign the Data User Agreement (you must be logged in through the Mount Sinai campus network/secure remote VPN and a Minerva HPC account is needed). On Minerva, you can load module $ module load dataark to see the path variables.

More information

Please visit MSDW page and open a ticket .

Helpful External Web Sites

“Why use the OMOP CDM?” from Retrieved May 22, 2023.

Data Ark Data Sets

Please visit the Data Ark Data Set webpage to explore other data sets.