The Value of Reward Lookahead in Reinforcement Learning.

Nadav Merlis Dorian Baudry Vianney Perchet

Published in: CoRR (2024)

Keyphrases

reinforcement learning
function approximation
state space
reinforcement learning algorithms
eligibility traces
reward function
machine learning
learning algorithm
action selection
model free
optimal policy
reward shaping
multi agent
average reward
optimal control
partially observable
learning agent
reinforcement learning methods
policy search
robotic control
total reward
partially observable environments
control policy
temporal difference
learning problems
transfer learning
supervised learning