Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning.

Amin Karbasi Nikki Lijing Kuang Yi-An Ma Siddharth Mitra

Published in: ICML (2023)

Keyphrases

reinforcement learning
multi armed bandit
function approximation
regret bounds
learning algorithm
robotic control
state space
multi agent
monte carlo
communication overhead
computer networks
reinforcement learning algorithms
temporal difference
model free
communication technologies
communication channels
information exchange
communication networks
stochastic systems
supervised learning
random sampling
multi armed bandits
mobile devices
online learning
worst case
least squares
dynamic programming
np hard