Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses.
Haipeng LuoChen-Yu WeiChung-Wei LeePublished in: NeurIPS (2021)
Keyphrases
- optimal policy
- markov decision processes
- markov decision process
- reinforcement learning
- optimization algorithm
- markov decision problems
- optimization problems
- finite horizon
- state space
- average reward
- policy search
- global optimization
- policy iteration
- action selection
- model based reinforcement learning
- evolutionary search
- access control
- average cost
- partially observable
- infinite horizon
- optimization process
- multi objective
- multi agent
- genetic algorithm