Ground-Truth, Whose Truth? - Examining the Challenges with Annotating Toxic Text Datasets.
Kofi ArhinIoana BaldiniDennis WeiKarthikeyan Natesan RamamurthyMoninder SinghPublished in: CoRR (2021)
Keyphrases
- ground truth
- text data
- database
- text documents
- key issues
- lessons learned
- ground truth data
- text mining
- benchmark datasets
- information retrieval
- training dataset
- semantic annotation
- gold standard
- text collections
- technical challenges
- keywords
- manually labeled
- text retrieval
- free text
- key concepts
- text classification
- textual information
- automatically extracted
- string matching
- document analysis
- air pollution