Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption.
Yige HongQiaomin XieYudong ChenWeina WangPublished in: NeurIPS (2023)
Keyphrases
- average reward
- long run
- markov decision processes
- optimal policy
- semi markov decision processes
- stochastic games
- discounted reward
- model free
- semi markov
- domain independent
- reinforcement learning
- policy iteration
- cost function
- optimal control
- fixed point
- markov chain
- state space
- probabilistic model
- optimality criterion
- stochastic systems
- multi agent
- machine learning