Off-Policy Reinforcement Learning with Delayed Rewards.

Beining Han Zhizhou Ren Zuofan Wu Yuan Zhou Jian Peng

Published in: CoRR (2021)

Keyphrases

reinforcement learning
markov decision processes
function approximation
reinforcement learning algorithms
state space
model free
reward function
temporal difference
reward shaping
robotic control
markov decision process
optimal policy
multi agent
transfer learning
optimal control
learning process
learning algorithm
partially observable
machine learning
dynamic programming
action selection
multi agent systems
reinforcement learning methods
multi agent reinforcement learning
policy search
real time
database