A little goes a long way: Improving toxic language classification despite data scarcity.
Mika JuutiTommi GröndahlAdrian FlanaganN. AsokanPublished in: CoRR (2020)
Keyphrases
- data collection
- data sources
- data sets
- small number
- original data
- synthetic data
- support vector machine svm
- text classification
- database
- probability distribution
- training set
- data analysis
- training data
- programming language
- missing data
- high dimensional data
- feature selection
- training samples
- data distribution
- input data
- image data
- classification accuracy
- model selection
- language learning
- neural network
- classification method
- data quality
- data reduction
- machine learning
- preprocessing
- feature extraction
- supervised learning
- data structure
- pattern recognition
- active learning