Q-Learning Lagrange Policies for Multi-Action Restless Bandits.
Jackson A. KillianArpita BiswasSanket ShahMilind TambePublished in: CoRR (2021)
Keyphrases
- optimal policy
- action selection
- discounted reward
- cooperative
- reinforcement learning
- state space
- learning algorithm
- multi agent
- markov decision process
- state action
- markov decision processes
- function approximation
- hierarchical reinforcement learning
- semi markov
- stochastic systems
- conservation laws
- stochastic approximation
- average reward
- reinforcement learning algorithms
- linear regression
- dynamic programming
- finite state
- learning rate
- multiagent reinforcement learning
- optimal control