UBEV - A More Practical Algorithm for Episodic RL with Near-Optimal PAC and Regret Guarantees.

Christoph Dann Tor Lattimore Emma Brunskill

Published in: CoRR (2017)

Keyphrases

learning algorithm
dynamic programming
worst case
objective function
detection algorithm
preprocessing
segmentation algorithm
theoretical guarantees
model free
particle swarm optimization
probabilistic model
active learning
search space
machine learning
least squares
reinforcement learning
simulated annealing
np hard
theoretical analysis
cost function
k means