De-Identification Service

The De-identification Service provides removal or masking of all potential identifiers from a data set using one of 3 methods:

HIPAA Safe Harbor

Removal of all data elements in 45 CFR 164.514(b)(2)(i) with zip codes truncated to 3 digits and ages over 89 bucketed

Hripcsak’s Shift-and-Truncate (SANT) Method

Obscures date information by shifting and truncating all dates within a patient record. Read more: Hripcsak et. al. “Preserving temporal relations in clinical data while maintaining privacy”

"Elapsed Days" Approach

Define a “time zero” (T0) for each patient and convert all dates to elapsed days from T0 and (optionally) add a “jitter” value of +/- a few days to each date (this should not affect statistics overmuch)

De-Identified Data Elements


The following 19 data elements are de-identified for data sets produced, in accordance with the Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule.


De-Identified Data Elements
Name All ages over 89
All element of dates (except year), including dates of birth, admission, discharge or death Street Address, city, county, zip code (the first three digits of the zip code may be used if there are more than 20,000 people in the zip code)
All telephone numbers Fax number
E-mail addresses Social Security Number (SSN)
Medical Record Number (MRN) Health plan beneficiary number
Account numbers Certificate/License number
Vehicle identifiers, including license plate numbers Device identification and/or serial number
Uniform Resource Locator (URL) Internet Protocol (IP) address
Biometric identifiers, including finger and voiceprints Full face photographic images and other comparable images
Any other unique identifying number, characteristic, or code




De-Identification Process

  • Requestor to provide data dictionary or definition of all data elements, particularly those that could include patient identifiers or masked identifiers
  • Estimate will be provided to requestor 
  • Hourly rate= $236 (rounded to the nearest 0.25 of hour)
  • Billing initiated once de-identification and validation of de-identified data set is completed
  • De-identified data set provided to user with form confirming that the set has no HIPAA identifiers

Once complete, all elements of date in the data set are shifted equally per patient, based on the date shift value present in the ‘masked mrn table’. This ensures that relative distance between dates in the patients’ chronology remain intact.

For more information contact the MSDW Team.

Please note that the MSDW team currently does not offer de-identified health information in free text fields, e.g., Progress Notes. As techniques evolve, the possibility of this data source will be revisited.