Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward.

Washim Uddin Mondal Vaneet Aggarwal

Published in: Trans. Mach. Learn. Res. (2023)

Keyphrases

reinforcement learning
state space
function approximation
eligibility traces
model free
reinforcement learning algorithms
temporal difference
partially observable environments
reward function
optimal policy
dynamic programming
multi agent
learning algorithm
learning problems
reward shaping
action selection
peer to peer
machine learning
average reward
reinforcement learning methods
policy gradient
multi agent reinforcement learning
partially observable
data sets
total reward
markov decision process
sufficient conditions
supervised learning
markov decision processes