Performance Investigation of UCB Policy in Q-learning.
Koki SaitoAkira NotsuSeiki UbukataKatsuhiro HondaPublished in: ICMLA (2015)
Keyphrases
- optimal policy
- action selection
- reinforcement learning
- cooperative
- function approximation
- multi agent
- learning algorithm
- policy iteration
- state space
- markov decision processes
- state action
- continuous state spaces
- reward function
- machine learning
- asymptotically optimal
- decision problems
- state dependent
- infinite horizon
- actor critic
- markov decision process
- approximate policy iteration
- discounted reward
- policy gradient
- agent receives
- stochastic approximation
- reinforcement learning algorithms
- model free
- learning rate
- evaluation function
- convergence rate
- dynamic programming