Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces.
Yinglun ZhuPaul MineiroPublished in: ICML (2022)
Keyphrases
- efficient learning
- action space
- state space
- markov decision processes
- continuous action
- reinforcement learning
- real valued
- multi armed bandit
- continuous state
- state and action spaces
- continuous state spaces
- regret bounds
- stochastic processes
- multi armed bandit problems
- action selection
- lower bound
- skill learning
- learning algorithm
- membership queries
- online learning
- markov chain
- dynamic programming
- reward function
- least squares
- bandit problems
- search algorithm