Keyphrases
- bandit problems
- decision problems
- reinforcement learning
- multi armed bandits
- optimal policy
- function approximation
- state space
- cooperative
- multi agent
- exploration exploitation
- learning algorithm
- action selection
- stochastic approximation
- reinforcement learning algorithms
- model free
- decentralized decision making
- multi agent reinforcement learning
- dynamic programming
- learning rate
- expected utility
- multi armed bandit problems
- decision makers
- special case
- active learning
- objective function
- artificial intelligence