Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning.

Amin Karbasi Nikki Lijing Kuang Yi-An Ma Siddharth Mitra

Published in: CoRR (2023)

Keyphrases

reinforcement learning
multi armed bandit
random sampling
regret bounds
model free
information exchange
state space
communication overhead
multi agent
function approximation
machine learning
learning process
email
multi armed bandits
stochastic systems
distributed control
reinforcement learning methods
reinforcement learning algorithms
communication channels
dynamic programming
linear regression
communication networks
worst case