Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning.
Amin KarbasiNikki Lijing KuangYi-An MaSiddharth MitraPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- multi armed bandit
- random sampling
- regret bounds
- model free
- information exchange
- state space
- communication overhead
- multi agent
- function approximation
- machine learning
- learning process
- multi armed bandits
- stochastic systems
- distributed control
- reinforcement learning methods
- reinforcement learning algorithms
- communication channels
- dynamic programming
- linear regression
- communication networks
- worst case