Figures & data
Table 1. De-identification systems summary. Machine learning indicates systems that uses machine learning techniques only. Hybrid systems indicates systems that used a combination of machine learning techniques and hand crafted rules.
Table 2. De-identification systems with overall F-measure 0.95.
Table 3. De-identification systems with overall F-measure 0.95.
Table 4. F-measure 0.95 for HIPAA categories for de-identification. On occasions where F-measure was not
0.95, the highest score is presented. CONTACT: URL and IP address; ID: BioID, Healthplan, Social Security no, and Vehicle license plate no; Face photo; and Any other unique code are PHIs that were not present in any of the dataset, hence not included here.
Table 5. Techniques used for the F-measures presented in for HIPAA PHI categories.
Table 6. The best F-measure for i2b2 extra categories for de-identification. This table includes categories not included in , but were introduced by i2b2 competitions as additional categories (Stubbs et al. (Citation2015); Stubbs and Uzuner (Citation2015a)). It also provides the techniques used to achieve these F-measures.
Table 7. Common practices used in surrogate generation and replacement of PHI as outlined in (Johnson et al. (Citation2016); Pantazos, Lauesen, and Lippert (Citation2017); Stubbs and Uzuner (Citation2015b); Stubbs et al. (Citation2015b)).