Getting the most out of your tokenizer for pre-training and domain adaptation.
Gautier DaganGabriel SynnaeveBaptiste RozièrePublished in: CoRR (2024)
Keyphrases
- domain adaptation
- covariate shift
- cross domain
- multiple sources
- sentiment classification
- test data
- training set
- transfer learning
- semi supervised learning
- semi supervised
- sufficient training data
- model selection
- test set
- labeled data
- training process
- information retrieval
- manifold alignment
- databases
- data sets
- unsupervised learning
- document classification
- data analysis
- reinforcement learning
- training and test data