Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs.
Jian QianRonan FruitMatteo PirottaAlessandro LazaricPublished in: NeurIPS (2019)
Keyphrases
- average reward
- markov decision processes
- regret minimization
- continuous state spaces
- semi markov decision processes
- optimal policy
- long run
- action space
- state space
- reinforcement learning
- stochastic games
- discounted reward
- policy iteration
- optimality criterion
- nash equilibrium
- model free
- partially observable markov decision processes
- game theoretic
- state and action spaces
- markov chain
- rl algorithms
- state action
- hierarchical reinforcement learning
- multi agent learning
- policy gradient
- markov decision process
- computational complexity
- finite number
- dynamic programming
- decision making
- heuristic search
- average cost
- infinite horizon
- finite state