Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces.

Yinglun Zhu Paul Mineiro

Published in: ICML (2022)

Keyphrases

efficient learning
action space
state space
markov decision processes
continuous action
reinforcement learning
real valued
multi armed bandit
continuous state
state and action spaces
continuous state spaces
regret bounds
stochastic processes
multi armed bandit problems
action selection
lower bound
skill learning
learning algorithm
membership queries
online learning
markov chain
dynamic programming
reward function
least squares
bandit problems
search algorithm