Login / Signup
Difference Rewards Policy Gradients.
Jacopo Castellini
Sam Devlin
Frans A. Oliehoek
Rahul Savani
Published in:
CoRR (2020)
Keyphrases
</>
reward function
markov decision processes
optimal policy
reinforcement learning
control policy
fully observable
expected reward
case study
policy making
bandit problems
discounted reward
image gradient
asymptotically optimal
average reward
decision process
multiarmed bandit
long term and short term