Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization.
Zihan ZhouWei FuBingliang ZhangYi WuPublished in: ICLR (2022)
Keyphrases
- optimization strategies
- optimization algorithm
- real time
- reinforcement learning
- global optimization
- optimal policy
- control policy
- partially observable environments
- optimization problems
- optimization methods
- total reward
- direct search
- policy gradient
- average reward
- optimal strategy
- optimization method
- least squares
- machine learning