Approximate Difference Rewards for Scalable Multigent Reinforcement Learning.

Arambam James Singh Akshat Kumar Hoong Chuin Lau

Published in: AAMAS (2021)

Keyphrases

reinforcement learning
function approximation
policy evaluation
markov decision processes
state space
reinforcement learning algorithms
learning algorithm
reward shaping
temporal difference
model free
optimal policy
reward function
continuous state
machine learning
highly scalable
reinforcement learning methods
learning process
web scale
multiarmed bandit
exact solution
transfer learning
lower bound
convex functions
complex domains
markov decision process
policy iteration
rl algorithms
dynamic programming
neural network
data sets