Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks.
Ryan SullivanAkarsh KumarShengyi HuangJohn P. DickersonJoseph SuarezPublished in: NeurIPS (2023)
Keyphrases
- partially observable environments
- optimization algorithm
- reinforcement learning
- optimal policy
- optimization problems
- reward function
- policy gradient
- global optimization
- optimization process
- inverse reinforcement learning
- combinatorial optimization
- constrained optimization
- markov decision processes
- optimization method
- optimization model
- control policy
- average reward
- direct search
- expected reward
- neural network