Q-Learning for Bandit Problems.

Michael O. Duff

Published in: ICML (1995)

Keyphrases

bandit problems
decision problems
reinforcement learning
multi armed bandits
optimal policy
function approximation
state space
cooperative
multi agent
exploration exploitation
learning algorithm
action selection
stochastic approximation
reinforcement learning algorithms
model free
decentralized decision making
multi agent reinforcement learning
dynamic programming
learning rate
expected utility
multi armed bandit problems
decision makers
special case
active learning
objective function
artificial intelligence