Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation.
Xiaoying ZhangJean-Francois TonWei ShenHongning WangYang LiuPublished in: CoRR (2024)
Keyphrases
- lightweight
- optimization algorithm
- reinforcement learning
- partially observable environments
- robust optimization
- multi agent
- inverse reinforcement learning
- wireless sensor networks
- long run
- optimization problems
- dos attacks
- average reward
- policy gradient
- utility function
- rfid tags
- reward function
- markov decision process
- total reward
- evolutionary algorithm