Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses.
Haipeng LuoChen-Yu WeiChung-Wei LeePublished in: CoRR (2021)
Keyphrases
- optimal policy
- markov decision processes
- markov decision problems
- policy iteration
- policy search
- reinforcement learning
- markov decision process
- model based reinforcement learning
- global optimization
- finite horizon
- reinforcement learning problems
- action selection
- partially observable
- optimization algorithm
- linear programming
- optimization problems
- state and action spaces
- average cost
- factored mdps
- design space exploration
- approximate dynamic programming
- multi agent
- reward function
- optimization process
- state space
- dynamic programming
- reinforcement learning algorithms
- action space
- decision processes
- average reward
- grey level
- model free
- optimization method
- markov chain