Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption.
Yige HongQiaomin XieYudong ChenWeina WangPublished in: CoRR (2023)
Keyphrases
- average reward
- markov decision processes
- optimal policy
- long run
- optimal control
- stochastic games
- reinforcement learning
- discounted reward
- optimality criterion
- semi markov
- model free
- semi markov decision processes
- sample path
- policy iteration
- least squares
- stochastic systems
- total reward
- data mining
- fixed point
- markov chain
- dynamic programming
- machine learning