Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces.
Yinglun ZhuPaul MineiroPublished in: CoRR (2022)
Keyphrases
- efficient learning
- action space
- state space
- markov decision processes
- reinforcement learning
- real valued
- continuous action
- state and action spaces
- stochastic processes
- continuous state
- regret bounds
- multi armed bandit
- continuous state spaces
- multi armed bandit problems
- learning algorithm
- action selection
- online learning
- lower bound
- skill learning
- dynamic programming
- single agent
- reward function
- markov decision process
- bandit problems
- policy search
- pattern languages
- multi agent
- membership queries
- xml documents