SEER
SEER Data & Software
SEER Incidence Data
- Description
-
- The Surveillance, Epidemiology, and End Results (SEER) Program provides information on cancer statistics in an effort to reduce the cancer burden among the U.S. population. SEER is supported by the Surveillance Research Program (SRP) in NCI’s Division of Cancer Control and Population Sciences (DCCPS).
- Link to website: Surveillance, Epidemiology, and End Results Program (cancer.gov)
- Training
- Example publications using SEER data:
-
- Mourad, M., Jetmore, T., Jategaonkar, A. A., Moubayed, S., Moshier, E., & Urken, M. L. (2017). Epidemiological Trends of Head and Neck Cancer in the United States: A SEER Population Study. Journal of oral and maxillofacial surgery : official journal of the American Association of Oral and Maxillofacial Surgeons, 75(12), 2562–2572. https://doi.org/10.1016/j.joms.2017.05.008
- Tsao, C. K., Small, A. C., Kates, M., Moshier, E. L., Wisnivesky, J. P., Gartrell, B. A., Sonpavde, G., Godbold, J. H., Palese, M. A., Hall, S. J., Oh, W. K., & Galsky, M. D. (2013). Cytoreductive nephrectomy for metastatic renal cell carcinoma in the era of targeted therapy in the United States: a SEER analysis. World journal of urology, 31(6), 1535–1539. https://doi.org/10.1007/s00345-012-1001-3
https://seer.cancer.gov/data/
SEER Statistical Software
- SEER*Explorer can be used to view cancer statistics from SEER data by race, age, gender, stage, and cancer subtypes, covering 48% of the U.S. population.
SEER Linked Databases
SEER Data Linkage: SEER CAHPS, SEER-Medicare, and SEER-MHOS (Youtube)
SEER-Medicare Linked Database
- Description
- SEER-Medicare data is the linkage between the SEER cancer registry and Medicare claims data. The linkage of these two data sources results in a unique population-based source of information.
- Link to website: SEER-Medicare Linked Data Resource (cancer.gov)
- Use requires an application, data use agreement, and fees
- Analytic Support
- Example publications using SEER-Medicare data:
-
- Taioli, E., Wolf, A., Alpert, N., Rosenthal, D., & Flores, R. (2023). Malignant pleural mesothelioma characteristics and outcomes: A SEER-Medicare analysis. Journal of surgical oncology, 128(1), 134–141. https://doi.org/10.1002/jso.27243
- Wolf, A., Alpert, N., Tran, B. V., Liu, B., Flores, R., & Taioli, E. (2019). Persistence of racial disparities in early-stage lung cancer treatment. The Journal of thoracic and cardiovascular surgery, 157(4), 1670–1679.e4. https://doi.org/10.1016/j.jtcvs.2018.11.108
SEER-Medicare Health Outcomes Survey (MHOS) Linked Database
- Description
- SEER-Medicare Health Outcomes Survey (MHOS) Linked Database links the SEER cancer registry and CMS’s Medicare Health Outcomes Survey,
- Goal: to improve understanding of Medicare Advantage Organization enrolled cancer patient and survivors health-related quality of life.
- Website: SEER-Medicare Health Outcomes Survey (SEER-MHOS) Linked Data Resource (cancer.gov)
- Accessing the data
- Obtaining SEER-MHOS Data
- Obtaining SEER-MHOS data requires an application process and fees
- Example Publications using MHOS
- Park, J., Kent, E. E., Lund, J. L., Anderson, C., Olshan, A. F., Brewster, W. R., & Nichols, H. B. (2023). Adjuvant radiation therapy and health-related quality of life among older women with early-stage endometrial cancer: an analysis using the SEER-MHOS linkage. Cancer causes & control : CCC, 34(3), 223–231. https://doi.org/10.1007/s10552-022-01658-8
- Park, C., Park, S. K., Woo, A., & Ng, B. P. (2022). Health-related quality of life among elderly breast cancer patients treated with adjuvant endocrine therapy: a U.S Medicare population-based study. Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation, 31(5), 1345–1357. https://doi.org/10.1007/s11136-021-03059-x
SEER-Consumer Assessment of Healthcare Providers and Systems (SEER-CAHPS) Linked Data Resource
- Description
- A linkage between SEER cancer registry data and CMS’ Medicare Consumer Assessment of Healthcare Providers and Systems (CAHPS®) patient surveys.
- Goal: Allow analysis of Medicare beneficiaries experiences with care through their cancer care continuum and comparison of experiences across Medicare programs
- Website: SEER-CAHPS Linked Data Resource
- How SEER-CAHPS differs from SEER-Medicare & SEER-MHOS (webpage + video)
- Accessing Data
- Example Publications using SEER-CAHPS
- Pandit, A. A., Gressler, L. E., Halpern, M. T., Kamel, M., Payakachat, N., & Li, C. (2022). Racial/Ethnic Disparities in Patient Care Experiences among Prostate Cancer Survivors: A SEER-CAHPS Study. Current oncology (Toronto, Ont.), 29(11), 8357–8373. https://doi.org/10.3390/curroncol29110659
- Lines, L. M., Cohen, J., Kirschner, J., Barch, D. H., Halpern, M. T., Kent, E. E., Mollica, M. A., & Smith, A. W. (2022). Associations between illness burden and care experiences among Medicare beneficiaries before or after a cancer diagnosis. Journal of geriatric oncology, 13(5), 731–737. https://doi.org/10.1016/j.jgo.2022.02.017
SEER-Medicaid Linked Data Resource
- Description
- The linkage between SEER and CMS’s Medicaid enrollment information
- Website: SEER-Medicaid Linked Data Resource
- Accessing Data
- Programming Support
- Example Publications using SEER-Medicaid
- Ganga, A., Kim, E. J., Lee, J. Y., Leary, O. P., Sastry, R. A., Fridley, J. S., Chang, K. E., Niu, T., Sullivan, P. Z., Somasundar, P. S., & Gokaslan, Z. L. (2023). Disparities in primary spinal osseous malignant bone tumor survival by Medicaid-status: a national population-based risk analysis. World neurosurgery, S1878-8750(23)01371-2. Advance online publication. https://doi.org/10.1016/j.wneu.2023.09.103
- Wang, S., & Ge, C. (2023). High risk of non-cancer mortality in bladder cancer patients: evidence from SEER-Medicaid. Journal of cancer research and clinical oncology, 149(12), 10203–10215. https://doi.org/10.1007/s00432-023-04867-z
Cancer Registry Data
National Cancer Database (NCDB)
- Description
-
- The National Cancer Database (NCDB) is a clinical oncology database which sources its hospital registry data from more than 1,500 Commission on Cancer-accredited facilities.
- Link to website: National Cancer Database (NCDB) | ACS (facs.org)
- “Using the National Cancer Database for Outcomes Research: A Review”
- NCDB Participant User Data Files (PUFs) are only available through an application process to investigators associated with CoC-accredited cancer programs.
- Information on obtaining NCDB PUF: Participant User Files | ACS (facs.org)
- Mount Sinai Only: Steps to Apply for NCDB Data.pdf
- Example publications using NCDB data:
-
- Galsky, M. D., Diefenbach, M., Mohamed, N., Baker, C., Pokhriya, S., Rogers, J., Atreja, A., Hu, L., Tsao, C. K., Sfakianos, J., Mehrazin, R., Waingankar, N., Oh, W. K., Mazumdar, M., & Ferket, B. S. (2017). Web-Based Tool to Facilitate Shared Decision Making With Regard to Neoadjuvant Chemotherapy Use in Muscle-Invasive Bladder Cancer. JCO clinical cancer informatics, 1, 1–12. https://doi.org/10.1200/CCI.17.00116
- Zeidman, M., Alberty-Oller, J. J., Ru, M., Pisapati, K. V., Moshier, E., Ahn, S., Mazumdar, M., Port, E., & Schmidt, H. (2020). Use of neoadjuvant versus adjuvant chemotherapy for hormone receptor-positive breast cancer: a National Cancer Database (NCDB) study. Breast cancer research and treatment, 184(1), 203–212. https://doi.org/10.1007/s10549-020-05809-w
- Galsky, M. D., Stensland, K. D., Moshier, E., Sfakianos, J. P., McBride, R. B., Tsao, C. K., Casey, M., Boffetta, P., Oh, W. K., Mazumdar, M., & Wisnivesky, J. P. (2016). Effectiveness of Adjuvant Chemotherapy for Locally Advanced Bladder Cancer. Journal of clinical oncology : official journal of the American Society of Clinical Oncology, 34(8), 825–832. https://doi.org/10.1200/JCO.2015.64.1076
- Galsky, M. D., Stensland, K., Sfakianos, J. P., Mehrazin, R., Diefenbach, M., Mohamed, N., Tsao, C. K., Boffetta, P., Wiklund, P., Oh, W. K., Mazumdar, M., & Ferket, B. (2016). Comparative Effectiveness of Treatment Strategies for Bladder Cancer With Clinical Evidence of Regional Lymph Node Involvement. Journal of clinical oncology : official journal of the American Society of Clinical Oncology, 34(22), 2627–2635. https://doi.org/10.1200/JCO.2016.67.503
National Program of Cancer Registries (NPCR)
- Description
- The National Program of Cancer Registries (NPCR) is operated by the CDC and provides state and territories financial support and training to collect population-based cancer incidence data.
- Link to website: NCPR
- Cancer Registry Research Approval Process: Classification of States by Level of Approval Required
- Publications using NPCR data:
- Benard, V. B., Greek, A., Jackson, J. E., Senkomago, V., Hsieh, M. C., Crosbie, A., Alverson, G., Stroup, A. M., Richardson, L. C., & Thomas, C. C. (2019). Overview of Centers for Disease Control and Prevention’s Case Investigation of Cervical Cancer Study. Journal of women’s health (2002), 28(7), 890–896. https://doi.org/10.1089/jwh.2019.7849
North American Association of Central Cancer Registries (NAACCR)
- Descriptions
- The North American Association of Central Cancer Registries (NAACCR) receives and combines de-identified population-based cancer data from registries from across the US and Canada to create Cancer in North America (CiNA) Data Products
- CiNA Data Products
- Requesting CiNA Datasets
- NAACCR Data Request Tracking system
- Accessing data requires, at minimum, a signed Data Use or Data Assurance agreement and may require at least one member of the research team to be a NAACCR Member
New York State Cancer Registry
- Description
- The New York State Cancer Registry collects, processes, and reports informatics about New Yorkers diagnosed with malignant cancer from reporting facilities such as hospitals.
- Data Products
- New York State Public Access Cancer Epidemiology Data (NYSPACED)
- NYSPACED currently offers 2 password protected datasets :
- cancer incidence by county
- cancer incidence by New York City neighborhood.
- Request a password
- Variables available in the NYSPACED data files:
- Age
- Sex
- Race
- Hispanic origin
- New York State county or New York City neighborhood
- Year of diagnosis
- Site of cancer
- Morphology (histology, behavior, and behavior recode for analysis)
- Laterality
- Grade
- Diagnostic confirmation
- Summary stage at diagnosis
- Sequence number – central
- Surgery of primary cancer site
- Radiation
- NYSPACED currently offers 2 password protected datasets :
- New York State Public Access Cancer Epidemiology Data (NYSPACED)
- Publications using NY State Cancer Registry
- Gates Kuliszewski, M., Boscoe, F. P., Wagner, V. L., & Schymura, M. J. (2021). Health Care Utilization Prior to Ovarian Cancer Diagnosis in Publicly Insured Individuals in New York State. Journal of registry management, 48(3), 126–137.
- Kahn, J. M., Zhang, X., Kahn, A. R., Castellino, S. M., Neugut, A. I., Schymura, M. J., Boscoe, F. P., & Keegan, T. H. M. (2021). Racial Disparities in Children, Adolescents, and Young Adults with Hodgkin Lymphoma Enrolled in the New York State Medicaid Program. Journal of Adolescent and Young Adult Oncology, 11(4), 360-369. https://doi.org/10.1089/jayao.2021.0131
Prospective Registry for Myeloid Mutations
Other Data Resources
MarketScan
MarketScan
- Description
- From IBM: The IBM® MarketScan® Research Databases are a family of research data sets that integrate de-identified patient-level health data (medical, drug and dental), productivity (workplace absence, short- and long-term disability and workers’ compensation), laboratory results, health risk assessments (HRAs), hospital discharges and electronic medical records (EMRs) into data sets available for healthcare research. Data are contributed by large employers, managed care organizations, hospitals, EMR providers, Medicare and Medicaid.
- Link to website: IBM MarketScan Research Databases for Life Sciences Researchers
- Data Ark description of IBM® MarketScan®: IBM Market Scan | Scientific Computing and Data (mssm.edu)
- Example publications using MarketScan data:
- Blank, L. J., Agarwal, P., Kwon, C. S., & Jetté, N. (2023). Association of first anti-seizure medication choice with injuries in older adults with newly diagnosed epilepsy. Seizure, 109, 20–25. https://doi.org/10.1016/j.seizure.2023.05.0060
PROSPR
- Description
- Population-based Research to Optimize the Screening Process (PROSPR), is an NCI-funded research network including 10 healthcare delivery systems across the United States.
- PROSPR’s goal is to better understand how to improve the cancer screening process (recruitment, screening, diagnosis, referral for treatment) in community healthcare settings in the United States.
- Each PROSPR Research Center (PRC) conducts organ-specific research, and also participates with other PRCs to conduct trans-PROSPR research projects.
- Website: Population-based Research to Optimize the Screening Process (PROSPR) (cancer.gov)
- Current funding cycle: PROSPR 2 (actively funded since 2018), studying cervical, colorectal, and lung cancer.
- Webinar: Population-based Research to Optimize the Screening Process (PROSPR) Initiative: Research Activities and Data Sharing Webinar (cancer.gov)
- Accessing
- PROSPR DataShare (PDS)
- Use PROSPR DataShare
- Public use data sets
- HIPPA-defined de-identified data sets available to any requestor who agrees to data use conditions
- Turnaround time of about 2 weeks.
- Restricted use data sets
- Custom HIPAA-defined limited data sets
- Generally requires collaboration with PROSPR investigators
- 2-step review process
- Submit an inquiry
- If approved, submit a full proposal
- Cost depends on involvement of PROSPR investigators/staff
- Timeline
- Review typically takes about three to four months
- Data set delivery typically takes 1-2 months.
- Public use data sets
- Example publications using PROSPR:
- Burnett-Hartman, A. N., Carroll, N. M., Honda, S. A., Joyce, C., Mitra, N., Neslund-Dudas, C., Olaiya, O., Rendle, K. A., Schnall, M. D., Vachani, A., & Ritzwoller, D. P. (2022). Community-based Lung Cancer Screening Results in Relation to Patient and Radiologist Characteristics: The PROSPR Consortium. Annals of the American Thoracic Society, 19(3), 433–441. https://doi.org/10.1513/AnnalsATS.202011-1413OC
- Ritzwoller, D. P., Meza, R., Carroll, N. M., Blum-Barnett, E., Burnett-Hartman, A. N., Greenlee, R. T., Honda, S. A., Neslund-Dudas, C., Rendle, K. A., & Vachani, A. (2021). Evaluation of Population-Level Changes Associated With the 2021 US Preventive Services Task Force Lung Cancer Screening Recommendations in Community-Based Health Care Systems. JAMA network open, 4(10), e2128176. https://doi.org/10.1001/jamanetworkopen.2021.28176
DREAM EMR Data
- Description
- DREAM is a volunteer organization of more than 30,000 solvers focused on crowd-sourced challenges to benchmark informatic algorithms in biomedicine.
- Website: DREAM Challenges
- Video overview: DREAM Challenges Overview – YouTube
- Example Publications using DREAM Challanges
MGB-EMR Data
- Description
-
- Linked the electronic health records from all Mass General Brigham facilities with multiple long-term outcome and health care utilization data, including insurance claims data, National Death Index (through Centers for Medicare & Medicaid Services), and state death data.
-
Additional Data Sources
NCI Cancer Research Data Commons (CRDC)
NIH Resources for Researchers Search
NIH Cancer Data Access System (“CDAS”)
From NIH:
- Genotype and phenotype datasets deposited in the database of Genotypes and Phenotypes (dbGaP);
- Gene expression profiles deposited in Gene Expression Omnibus (GEO) and high-throughput functional genomics experiments and assays deposited in the ArrayExpress archive;
- DNA/RNA binding and DNA accessibility/methylation experiments deposited in the Encyclopedia of DNA Elements (ENCODE);
- Genotype and RNA-seq data across tissue sites and cell lines deposited in the Genotype-Tissue Expression (GTeX) and Developmental GTeX (dGTeX) projects;
- Proteomic data measured by mass spectrometry in cancer biospecimens from CPTAC and ICPC deposited in the Proteomic Data Commons (PDC);
- Sequencing data from human samples deposited in the NCBI Short Read Archive (SRA) and in the NCI Cancer Data Service (CDS);
- Harmonized cancer genomic datasets deposited in the NCI Genomic Data Commons (GDC);
- Imaging information linked to clinical and genomic data across cancer sites available in The Cancer Imaging Archive (TCIA) and the NCI Imaging Data Commons (IDC);
- Data on therapy outcomes from clinical trials and patient registries such as the Pediatric Proton Consortium Registry (PPCR);
- Epidemiological, clinical, and molecular data from established cancer cohorts such as the follow-up of the Prostate, Lung, Colorectal and Ovarian (PLCO) Trial and others in the Cancer Epidemiology Descriptive Cohort Data (CEDCD), as well as from newer cancer cohorts such as the Connect for Cancer Prevention Study;
- Data from the NCI funded research network of US healthcare delivery systems Population-based Research to Optimize the Screening Process (PROSPR) DataShare, the Patterns of Care (POC) initiative, the Social Determinants of Health Dataset, and other healthcare delivery datasets;
- Population-level health survey data such as the National Health Interview Survey (NHIS), the National Health and Nutrition Examination Survey (NHANES), the Behavioral Risk Factor Surveillance System (BRFSS), the Medical Expenditures, Panel Survey (MEPS), the Health Information National Trends Survey (HINTS), the Tobacco Use Supplement to the Current Population Survey (TUS-CPS), TUS-CPS datasets linked to other US Census Bureau research datasets and National Death Index registry (e.g., the Tobacco Longitudinal Mortality Study (TLMS) and sub-linkages);
- Behavioral and social science data in the Inter-university Consortium for Political and Social Research (ICPSR) data repository;
- Data available through the Data and Specimen Hub (DASH), including the Environmental influences on Child Health Outcomes (ECHO);
- The NIH Common Fund-supported Gabriella Miller Kids First Data Resource Center and the trans-NIH INvestigation of Co-occurring conditions across the Lifespan to Understand Down syndromE (INCLUDE) Data Coordinating Center Data Hub;
- Genetic, lifestyle, and exposure data from Veterans partners in the Million Veteran Program (MVP);
- Genomic, other ‘omic, and phenotype data available through NHGRI’s Genomic Data Science Analysis, Visualization, and Informatics Lab-Space (AnVIL);
- Other datasets, including those listed in the NIH resources page and HHS data hub page.