Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning.

Published in: SIAM J. Math. Data Sci. (2022)

Keyphrases

reinforcement learning
multi armed bandit
contextual information
function approximation
state space
multi armed bandits
random sampling
monte carlo
multi agent
markov decision processes
context dependent
model free
temporal difference
learning algorithm
sample size
optimal policy
stochastic systems
machine learning
learning problems
reinforcement learning algorithms
sampling algorithm
control policy
stochastic approximation