Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning.
Tong ZhangPublished in: SIAM J. Math. Data Sci. (2022)
Keyphrases
- reinforcement learning
- multi armed bandit
- contextual information
- function approximation
- state space
- multi armed bandits
- random sampling
- monte carlo
- multi agent
- markov decision processes
- context dependent
- model free
- temporal difference
- learning algorithm
- sample size
- optimal policy
- stochastic systems
- machine learning
- learning problems
- reinforcement learning algorithms
- sampling algorithm
- control policy
- stochastic approximation