Transformers can optimally learn regression mixture models.
Reese PathakRajat SenWeihao KongAbhimanyu DasPublished in: ICLR (2024)
Keyphrases
- mixture model
- model selection
- gaussian mixture model
- em algorithm
- density estimation
- generative model
- probabilistic model
- mixture modeling
- language model
- regression model
- expectation maximization
- unsupervised learning
- probability density function
- machine learning
- cross validation
- maximum likelihood
- finite mixtures
- automatic model selection
- log linear models
- active learning
- model based clustering
- data mining
- dirichlet prior