Login / Signup
Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits.
Guojun Xiong
Jian Li
Rahul Singh
Published in:
CoRR (2021)
Keyphrases
</>
reinforcement learning
finite horizon
optimal policy
optimal control
markov decision processes
multi agent
random walk
multistage
action selection