Login / Signup
Discounted UCB1-tuned for Q-learning.
Koki Saito
Akira Notsu
Katsuhiro Honda
Published in:
SCIS&ISIS (2014)
Keyphrases
</>
optimal policy
markov decision processes
reinforcement learning
state space
discounted reward
multi agent
function approximation
decision problems
cooperative
infinite horizon
dynamic programming
reinforcement learning algorithms
bandit problems
finite horizon
policy iteration
average reward
markov decision process
model free
multi armed bandit
action selection
cash flow
stochastic approximation
learning algorithm
reward function
finite state
average cost
credit assignment
state action
sufficient conditions
partially observable markov decision processes
long run
linear programming