Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models.
Steve YadlowskyLyric DoshiNilesh TripuraneniPublished in: CoRR (2023)
Keyphrases
- model selection
- data sets
- cross validation
- prior knowledge
- statistical inference
- bayesian methods
- missing data
- model selection criteria
- hypothesis tests
- mixture model
- parameter estimation
- selection criterion
- high dimensional data
- bayesian learning
- automatic model selection
- hyperparameters
- variable selection
- incomplete data
- statistical model
- generative model
- probability distribution
- probabilistic model
- objective function
- machine learning