IGOT: Information Gain Optimized Tokenizer on Domain Adaptive Pretraining.
Dawei FengYihai ZhangZhixuan XuPublished in: CoRR (2024)
Keyphrases
- information gain
- decision trees
- feature selection
- text categorization
- mutual information
- chi square
- chi squared
- occurrence frequency
- naive bayes
- document frequency
- feature selection for text categorization
- image processing
- unsupervised learning
- text classification
- semi supervised
- decision tree learners
- machine learning
- data mining