Login / Signup
Exploration in Reward Machines with Low Regret.
Hippolyte Bourel
Anders Jonsson
Odalric-Ambrym Maillard
Mohammad Sadegh Talebi
Published in:
AISTATS (2023)
Keyphrases
</>
bandit problems
reinforcement learning
lower bound
online learning
high levels
expected reward
neural network
worst case
reward function
support vector
decision problems
long run
weighted majority
minimax regret