Regret Bounds for Kernel-Based Reinforcement Learning.

Omar Darwiche Domingues Pierre Ménard Matteo Pirotta Emilie Kaufmann Michal Valko

Published in: CoRR (2020)

Keyphrases

reinforcement learning
regret bounds
multi armed bandit
online learning
state space
support vector machine
lower bound
learning process
model free
linear regression
optimal policy
learning algorithm
temporal difference
kernel methods
e learning
learning problems
online convex optimization
markov decision processes
upper bound
gaussian mixture
probabilistic model
support vector
machine learning