Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning.
Amin KarbasiNikki Lijing KuangYi-An MaSiddharth MitraPublished in: ICML (2023)
Keyphrases
- reinforcement learning
- multi armed bandit
- function approximation
- regret bounds
- learning algorithm
- robotic control
- state space
- multi agent
- monte carlo
- communication overhead
- computer networks
- reinforcement learning algorithms
- temporal difference
- model free
- communication technologies
- communication channels
- information exchange
- communication networks
- stochastic systems
- supervised learning
- random sampling
- multi armed bandits
- mobile devices
- online learning
- worst case
- least squares
- dynamic programming
- np hard