Login / Signup
Exploration Through Reward Biasing: Reward-Biased Maximum Likelihood Estimation for Stochastic Multi-Armed Bandits.
Xi Liu
Ping-Chun Hsieh
Yu-Heng Hung
Anirban Bhattacharya
P. R. Kumar
Published in:
ICML (2020)
Keyphrases
</>
maximum likelihood estimation
multi armed bandits
bandit problems
multi armed bandit
em algorithm
reinforcement learning
maximum likelihood
parameter estimation
probability distribution
decision problems
probability density
action selection
multivariate gaussian
expectation maximization
mixture model
long run