Login / Signup
Learning Infinite-Horizon Average-Reward Restless Multi-Action Bandits via Index Awareness.
Guojun Xiong
Shufan Wang
Jian Li
Published in:
NeurIPS (2022)
Keyphrases
</>
infinite horizon
stochastic games
optimal policy
markov decision processes
optimal control
learning algorithm
reinforcement learning
average reward
long run
state action
data mining
search algorithm
monte carlo
policy gradient