Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward.

Washim Uddin Mondal Vaneet Aggarwal

Published in: CoRR (2023)

Keyphrases

reinforcement learning
function approximation
state space
model free
eligibility traces
learning algorithm
markov decision processes
reward function
reinforcement learning algorithms
temporal difference
dynamic programming
total reward
multi agent
learning agent
learning process
supervised learning
reward shaping
learning problems
transfer learning
optimal policy
peer to peer
action selection
single agent
state action
reinforcement learning methods
robotic control