Exploiting Reward Shifting in Value-Based Deep RL.

Hao Sun Lei Han Rui Yang Xiaoteng Ma Jian Guo Bolei Zhou

Published in: CoRR (2022)

Keyphrases

reinforcement learning
function approximation
policy gradient
state space
optimal policy
machine learning
reward function
model free
markov decision processes
multi agent
temporal difference
rl algorithms
control policy
total reward
eligibility traces
long run
neural network
learning algorithm
complex domains
learning agent
average reward
deep learning
reinforcement learning methods
transfer learning
action selection
learning classifier systems
active learning