Difference Rewards Policy Gradients.
Jacopo CastelliniSam DevlinFrans A. OliehoekRahul SavaniPublished in: AAMAS (2021)
Keyphrases
- reward function
- reinforcement learning
- optimal policy
- control policy
- expected reward
- markov decision processes
- information technology
- state space
- asymptotically optimal
- fully observable
- finite horizon
- reinforcement learning algorithms
- long term and short term
- total reward
- policy iteration
- real time
- infinite horizon
- machine learning