Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies.

Wesley Cowan Michael N. Katehakis Daniel Pirutinsky

Published in: CoRR (2019)

Keyphrases

reinforcement learning
optimal policy
policy search
learning algorithm
control policies
markov decision processes
markov decision process
machine learning
optimal control
reward function
reinforcement learning algorithms
multi armed bandit
policy gradient methods