Achieving Tractable Minimax Optimal Regret in Average Reward MDPs.
Victor BooneZihan ZhangPublished in: CoRR (2024)
Keyphrases
- average reward
- total reward
- markov decision processes
- optimal policy
- long run
- optimality criterion
- discounted reward
- semi markov decision processes
- reinforcement learning
- worst case
- reward function
- stochastic games
- policy iteration
- finite state
- model free
- discount factor
- markov chain
- minimax regret
- dynamic programming
- state space
- state and action spaces
- decision problems
- hierarchical reinforcement learning
- state action
- lower bound
- finite horizon
- markov decision process
- partially observable markov decision processes
- average cost
- np hard
- partially observable
- stationary policies
- infinite horizon
- evaluation function
- linear programming
- optimal solution