Beyond Accuracy: Automated De-Identification of Large Real-World Clinical Text Datasets.
Veysel KocamanHasham Ul HaqDavid TalbyPublished in: CoRR (2023)
Keyphrases
- real world
- high accuracy
- data sets
- synthetic datasets
- information retrieval
- synthetic data
- case study
- fully automated
- free text
- classification accuracy
- database
- co occurrence
- text data
- prediction accuracy
- web documents
- computational cost
- real life
- training set
- text documents
- raw data
- text retrieval
- computational complexity
- natural language
- wide range
- feature selection
- textual data
- text collections