Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks.
Ryan SullivanAkarsh KumarShengyi HuangJohn P. DickersonJoseph SuarezPublished in: CoRR (2023)
Keyphrases
- optimization algorithm
- reinforcement learning
- optimization problems
- optimization method
- partially observable environments
- optimal policy
- neural network
- expected reward
- average reward
- global optimization
- long run
- reward function
- computational efficiency
- inverse reinforcement learning
- scale space
- optimization process
- optimization model
- action selection
- policy gradient
- total reward
- data sets