Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits.
Yu-Heng HungPing-Chun HsiehXi LiuP. R. KumarPublished in: CoRR (2020)
Keyphrases
- maximum likelihood estimation
- maximum likelihood
- stochastic systems
- em algorithm
- parameter estimation
- multi armed bandit
- probability distribution
- expectation maximization
- multivariate gaussian
- reinforcement learning
- regret bounds
- density function
- mixture of gaussians
- conjugate gradient
- minimum classification error
- proximal point