Q-Learning Lagrange Policies for Multi-Action Restless Bandits.
Jackson A. KillianArpita BiswasSanket ShahMilind TambePublished in: KDD (2021)
Keyphrases
- optimal policy
- action selection
- discounted reward
- cooperative
- optimal control
- reinforcement learning
- function approximation
- multi agent
- state space
- semi markov
- markov decision processes
- reinforcement learning algorithms
- learning algorithm
- average reward
- initial state
- reward function
- model free
- decision problems
- learning rate
- access control
- partially observable markov decision processes
- dynamic programming
- policy iteration
- action space
- mobile robot
- state action
- stochastic approximation
- hierarchical reinforcement learning
- sufficient conditions