On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains.

Theodore J. Perkins Mark D. Pendrith

Published in: ICML (2002)

Keyphrases

fixed point
partially observable domains
reinforcement learning
temporal difference learning
policy iteration
reinforcement learning algorithms
function approximation
partially observable
state space
dynamical systems
inverse reinforcement learning
model free
temporal difference
sufficient conditions
action selection
optimal policy
sensing actions
supervised learning
action models
transfer learning
partially observable markov decision processes
learning algorithm
function approximators
markov decision process
multi agent
markov decision processes
belief propagation
monte carlo
dynamic programming
game playing