Scientific Computing and Data / Research Data Services / Data Ark: Data Commons / Digital Pathology Slides
De-identified Digital Pathology Slides (Coming Soon)
Overview
The Departments of Pathology, Molecular, & Cell Based Medicine, Windreich Department of AI and Human and Health, and Scientific Computing and Data, have collaborated to share this extensive digital archive of over 1.5 million whole slide images, collected from the Mount Sinai Anatomic Pathology and Consultation Service. These specimens encompass a broad spectrum of biopsies, resections, and autopsies, reflecting the diversity of diseases affecting patients from a wide range of backgrounds. Virtually every organ system is represented within this collection, including but not limited to the lung, heart, pancreas, kidney, liver, gastrointestinal, genitourinary, gynecological, hematological, and neuropathological systems. The disease processes span a wide array, encompassing neoplastic, developmental, inflammatory, toxic, metabolic, genetic, degenerative, traumatic, and infectious pathologies. The slides were prepared using a variety of staining techniques, from routine hematoxylin and eosin (H&E) to specialized stains like silver and trichrome, as well as immunohistochemistry. This rich dataset offers a unique and powerful resource for advancing the study of human disease through digital pathology.
What Is Hosted Under Data Ark?
Currently, Data Ark hosts 1.5 million de-identified digital pathology slides and more slides will be made available on a continuing basis. Digital pathology slides have been linked to patients’ EHR (electronic health record), and EHR data is available from Mount Sinai Data Warehouse. Slides were scanned on the Philips Ultrafast or other system at 40x magnification. iSyntax files were converted to TIF. Slides served through Data Ark have been de-identified and digitized into a readable TIF format.
Figure 1. Digitized pathology slides are gigapixel images that can span hundreds of thousands of pixels in each dimension.
Digital pathology slides were collected from 191,119 patients through October 1, 2024, with demographic information detailed below.
Table 1. Cohort demographics: (a) gender, (b) race and ethnicity, and (c) age.
Gender | Count |
---|---|
Female | 120,121 |
Indeterminate | 13 |
Male | 70,945 |
Unknown | 40 |
Race Ethnicity Combined | Count |
---|---|
American Indian or Alaska Native | 140 |
Asian | 14,071 |
Black or African-American | 28,323 |
Hispanic | 35,508 |
Native Hawaiian or Pacific Islander | 159 |
Patient Declined | 958 |
White | 66,515 |
Unknown/Other/Not Reported | 45,445 |
Age Group | Count |
---|---|
0-10 | 1,377 |
11-20 | 3,239 |
21-30 | 14,950 |
31-40 | 28,684 |
41-50 | 31,704 |
51-60 | 35,551 |
61-70 | 38,128 |
71-80 | 26,597 |
81-90 | 9,424 |
91+ | 1,465 |
Access (Coming Soon)
To use this data, you must read, agree and sign the Data Use Agreement (you must be logged in through the Mount Sinai campus network or secure remote VPN). IRB (Institutional Review Board) approval is not required to access Digital Pathology Slide data via Data Ark.
After granted access, you can access the slides by going to the folder on Minerva directly. To get the path variable, you can load the module by issuing the command $ module load dataark.
In addition, we are also working on the Digital Slide Archive web application for interactive slide viewer and annotation (coming soon).
The digital pathology technology effectively reduces process time
Figure 1. Value stream mapping of simple (a) traditional glass slide vs. (b) digital slide consult. Image adapted from Haghighi, Mehrvash, et al. 2021. doi: 10.4103/jpi.jpi_74_21.
Data Ark Data Sets
Please visit the Data Ark Data Set webpage to explore other data sets.