Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces.

Yinglun Zhu Paul Mineiro

Published in: CoRR (2022)

Keyphrases

efficient learning
action space
state space
markov decision processes
reinforcement learning
real valued
continuous action
state and action spaces
stochastic processes
continuous state
regret bounds
multi armed bandit
continuous state spaces
multi armed bandit problems
learning algorithm
action selection
online learning
lower bound
skill learning
dynamic programming
single agent
reward function
markov decision process
bandit problems
policy search
pattern languages
multi agent
membership queries
xml documents