A Duality Approach for Regret Minimization in Average-Award Ergodic Markov Decision Processes.
Hao GongMengdi WangPublished in: L4DC (2020)
Keyphrases
- markov decision processes
- regret minimization
- average cost
- discounted reward
- finite state
- reinforcement learning
- state space
- optimal policy
- transition matrices
- policy iteration
- dynamic programming
- decision theoretic planning
- finite horizon
- game theoretic
- markov chain
- reachability analysis
- planning under uncertainty
- partially observable
- reinforcement learning algorithms
- linear programming
- state and action spaces
- action space
- average reward
- infinite horizon
- model based reinforcement learning
- stationary policies
- markov decision process
- nash equilibrium
- machine learning
- stochastic games
- action sets
- multi agent learning
- probabilistic planning
- long run