A: You can search the Mount Sinai Data Warehouse (MSDW) to identify eligible cohorts by using one of our de-identified self-service tools: Leaf, ATLAS, or TriNetX. Each tool has an accompanying tutorial for use. Click here to learn more.

Q: What Data sources are available in the Data Warehouse?

A: The majority of the data provided by the MSDW comes from the Epic (Verona, WI) electronic health record (EHR), which is the primary EHR for the Mount Sinai Health System (MSHS). Epic was implemented as the inpatient EHR at Mount Sinai Hospital (MSH) in late 2011. The MSDW provides researchers access to data on the over 12 million unique patients in the Mount Sinai Health System Epic EHR, of which nearly 5.5 million of these patients have had an encounter. In total there are over 115 million patient encounters recorded in Epic. The majority of the data collected by the MSDW comes from the Epic Clarity and Caboodle databases, as Epic is the primary electronic health record (EHR) across the Mount Sinai Health System (MSHS). The clinical data is extracted from the Epic Caboodle and Clarity databases, transformed to the OMOP Common Data Model (CDM) and is stored in the MSDW2 database. Epic data can be supplemented with other data resources, such as insurance information from MSX and endoscopy data from Provation.

Q: How far back does the data go?

Q: How often is the data refreshed?

A: The data in the MSDW is refreshed nightly.

Q: What data is available in MSDW?

A: MSDW data broadly includes clinical and operational data, specifically patient encounters and associated diagnoses; patient medical, surgical, social, family, and immunization histories; patient assessments; vitals; clinical orders including lab, radiology, pathology, and medications; medication administrations, clinical progress notes; and clinical discharge summaries. A vast majority of this data is drawn from Mount Sinai Health System’s Epic EHR. Our Data Analysts can help you to determine what will work best for your research and ensure your requested data set is accurate and comprehensible.

Also, please see our comprehensive list of data sources for more details here.

Q: Does MSDW have de-identified and PHI data?

A: MSDW contains PHI data, which is accessible with an approved IRB or QI documentation. Researchers requesting data without an IRB will receive de-identified data. The de-identified data set is provided with confirmation that the data have no HIPAA identifiers. Please note that the MSDW team does not offer de-identified health information in free text fields (such as progress notes). Click here to learn more about the de-identified data service.

Q: What information is available in the Enterprise Data Catalog?

A: The Informatica Enterprise Data Catalog is intended to include all three types of metadata about our OMOP databases in MSDW:

Metadata Type	Alternative Name	Definition
Technical	Structural	Data that provides information about containers of data (documents, web pages, database tables)
Business	Descriptive	Information about the business-meaning of data (e.g. description of a KPI) Data definitions (entities, attributes, elements)
Process	Administrative	Data logged throughout the operation of a DW or system (e.g. transformation steps, errors, CPU consumption) Author, create time Data lineage, source-to-target mappings

We have all technical metadata and the data lineage information because it can be obtained programmatically from the database engine. Our technical metadata for our OMOP tables does not strictly match OHDSI’s specifications for the OMOP CDM in all cases. OHDSI allows this, as long as the changes are “compatible”.

Q: Is there a centralized information point regarding all data available to Sinai researchers?

A: Yes, there is. You can browse here for more information on the data that is available in the MSDW.

Q: If I want identifiable data or a custom report, what should I do?

A: Data with identifiable information is available with a valid IRB or QI documentation detailing the PHI allowed. When completing a ticket to request MSDW data, include a copy of the valid IRB or QI documentation along with the details of the data request. Click here to submit a ticket. Click here to learn more about the IRB process through the Office of Research Services.

Q: Which Cohort Query Tool should I use?

A: If you are searching for de-identified diagnoses, procedures, encounters, labs, medications, vitals, or demographics and review generated visualizations of demographic details, Leaf may be the intuitive self-service tool for you. Leaf offers de-identified data only, and therefore no IRB or QI documentation is required to browse Leaf. Click here to see Leaf’s user tutorial.

If you are searching for Facilities, diagnoses, procedures, medications, labs, orders, and patient demographics using an OMOP-centric standard vocabulary, ATLAS may be the dynamic query tool for you. ATLAS offers de-identified data to all Mount Sinai users and contains PHI data and custom datamarts for researchers with valid IRB or QI documentation. Click here to see ATLAS’ user tutorial.

If you are searching for Demographics, diagnoses, procedures, medications, labs, vitals, BioMe patients, and Image Research Warehouse patients, TriNetX may be the clinical research query tool for you. Though the global research network can be slow to operate at times, de-identified data and imagery is available to users. Click here to see TriNetX’s user tutorial.

Still unsure which tool is best for you? Click here to see a comparison table.

Q: How do you compare Leaf, ATLAS, and TriNetX from a user perspective?

A: Leaf and ATLAS query MSDW2, which is run by Scientific Computing and Dara. The TriNetX tool queries a third-party database to which Mount Sinai contributes data. The functionality of TriNetX for defining a patient cohort is very similar. To compare data types, advantages, and disadvantages of the three tools, refer to the MSDW Services comparison table.

Q: I am getting an error when trying to access or log in to Leaf or ATLAS. What do I do?

A: All users need to be connected to the Mount Sinai network with VPN tunnel (or on site) to access these tools. If you are still getting an error message in your browser, please submit a help ticket.

Q: Do I need to request access every time I want to use a query tool to review de-identified data?

A: Nope! Requesting access to a query tool will establish a user account. Each user must have their own account to access a query tool, and once an account is set up a user can keep using their login credentials to access de-identified data.

For users requesting PHI data, a valid IRB must be submitted with access request. When the IRB expires, a renewed IRB must be submitted to continue accessing approved PHI data. Click here to submit access request.

Q: Is it possible to get number of patients for a particular disease that are not getting treatment (i.e. new diagnosis)?

A: The “Conditions” being queried by both Leaf and ATLAS in MSDW2 are the diagnoses from Epic. These include encounter diagnoses, problem list, hospital problems, past medication history, etc. If the patient has a diagnosis documented in this way, the query tools will pick them up, regardless of whether the patient is getting treatment or not.

Q: How much are the Data Warehouse services?

A: Custom queries with our analysts are charged at a rate of $180 per hour. A fund number is required to get started. You can open a ticket to request information here. Please sure to budget for this when developing your grant applications.

If you need a cost estimate ahead of time, submit an MSDW Data Request ticket and specify the request as a Cost Estimate Only. The MSDW team will provide a cost estimate and, from there, you may choose to move forward with the data request.
.

Q: Do I have to give a fund number?

A: Yes, a fund number must be provided for all IRB Approved projects. Quality, process, financial improvement or similar (non-research) projects, are covered by combined Hospital/School support of the Data Warehouse. There is no chargeback to individual requestors for these types of projects. See the MSDW charge back policy here.

Q: What paperwork/approvals do I need for an IRB research project?

A: All IRB project requests should include an IRB approval letter and protocol document. You can learn more about the IRB process through the Office of Research Services here.

Q: What is the turnaround time for a custom data set?

A: Turnaround time varies. A standard data request turnaround time is 2-3 weeks. From the time a researcher submits a data request to the point of receiving the requested data set may take anywhere from 1-6 weeks depending on complexity of data, change requests mid-process, and billing/fund processing. Click here to review the Custom Data Set Delivery process and the steps involved.

Q: Can I receive a cost and time estimate before agreeing to my data request?

A: Yes, cost estimates for approved projects will be provided to the user based on the complexity of the request, and the department’s charge back policy. Please note, if project specifications change, the final price may be adjusted.

Q: Can I submit a recurring request?

A: Yes, for recurring data delivery request, a new ticket must be created and submitted via JIRA which can be found here.

Q: Can I make changes to my data request requirements?

A: Yes, changes can be made to your data request. Please note that these changes may affect the cost and timeline of the original request.

Q: What if there is an issue with the data after my ticket has been closed?

A: If there is an issue with your data after your ticket has been closed, please open a new ticket for “existing reports” and provide the original request ticket number. You can do this here.

Q: When do I get my data?

A: Once you have approved the data during a meeting with your Data Analyst, an invoice will be emailed to you. Upon payment, you will receive the remainder of your data.

Q: How will my data be provided to me?

A: Data delivery methods conform to Mount Sinai IT Security Officer and Privacy Officer policies.

1. via Mount Sinai email
Data may be sent to MSDW users via internal Mount Sinai email without leaving Mount Sinai’s secured network. According to policy, these files are not encrypted.

2. via external email
Data may be sent to MSDW users via external email systems. All data sent via external email must be encrypted and password protected.

3. via shared drives
Files too large to be sent via email may be written to properly secured departmental shared drives. These files must be encrypted and password protected, to protect against inadvertent access by other users of the shared drives.

4. via FTP
Large files may be pushed via secured FTP verified user-supplied FTP servers. These files must be encrypted and password protected, to protect against inadvertent delivery to an unsecured or unintended FTP site.

5. via secured data media
Files may be delivered by copying to a secured encrypted thumb drive of a type authorized by the Mount Sinai IT Security Officer. When this type of device is utilized, it must be hand-delivered to the intended user.

7. via Mount Sinai One Drive
Large files may be uploaded to Mount Sinai One Drive. These files must be encrypted and password protected, to protect against inadvertent delivery to an unsecured or unintended user.

6. Sending data externally
For cooperative research involving external collaborators or other institutions, data may be delivered to identified collaborators, provided that they are named in the research plan/IRB approval and may be sent via any of the above methods.

User responsibility
It is emphasized in training, and reinforced when data is delivered, that users are expected to abide by Mount Sinai IRB and PPHS policy, HIPAA/Privacy regulations, and Mount Sinai IT Security policy, with regard to proper use of data and protection of PHI.

Q: What is Epic Clarity?

A: Clarity is an SQL server database that is a subset of data from Epic. It runs complex, data-intensive reports. Data within Clarity includes radiology, oncology, surgery, patient demographics, and patient medical history.

Q: What is Epic Caboodle?

A: Caboodle is Epic’s enterprise data warehouse that supports clinical information systems.

Q: What is OMOP?

A: The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) is a data standard for medical terms. This common vocabulary enables standardized analytics across clinical domains. Preserving consistency in data ensures that researchers are consistently accessing clean, quality data in queries. Click here to learn more about OMOP CDM version 6.0.

Q: Where can I get more information about OMOP Data in MSDW?

A: Details about MSDW2 OMOP Data–including Epic codes, data tables, clinical notes–are answered in detail through our new documentation “MSDW2 OMOP Data Questions and Answers Interview.”

Q: What are ways I can meet with the MSDW team to discuss my needs?

A: The best first step is to submit a ticket, even for an estimate. After review, the MSDW team will schedule a meeting with you, likely through Zoom, to discuss your research needs.

The MSDW team supports a Digital Concierge Service, hosting open office hours on Wednesdays from 3:30-4:30 pm. Meeting with an MSDW analyst for a quick clarification or brief question is best through Digital Concierge. Click here for more information on this walk-in clinic.

Q: Who can I contact if I have questions?

A: The MSDW team can be reached through the ticketing service. You can submit support requests, bug reports, inquiries, and research needs through the ticketing system here.