Go Simple and Pre-Train on Domain-Specific Corpora: On the Role of Training Data for Text Classification.
Aleksandra EdwardsJosé Camacho-ColladosHélène de RibaupierreAlun D. PreecePublished in: COLING (2020)
Keyphrases
- text classification
- domain specific
- training data
- training corpus
- general purpose
- training set
- labeled training data
- bag of words
- decision trees
- naive bayes
- machine learning
- labeled data
- learning algorithm
- natural language processing
- text mining
- text data
- text documents
- image classification
- supervised learning
- classification accuracy
- data cleaning
- training documents
- text classifiers
- semantic features
- data sets
- specific domains
- training dataset
- domain independent
- multi label
- n gram
- training examples
- prior knowledge
- feature selection
- data mining