Online learning in MDPs with linear function approximation and bandit feedback.

Gergely Neu Julia Olkhovskaya

Published in: CoRR (2020)

Keyphrases

function approximation
online learning
reinforcement learning
temporal difference learning algorithms
function approximators
regret bounds
markov decision processes
temporal difference
model free
radial basis function
temporal difference learning
active learning
policy evaluation
policy search
learning tasks
multi agent
e learning
state space
markov decision problems
reinforcement learning algorithms
optimal policy
reinforcement learning problems
temporal difference methods
reinforcement learning methods
finite state
markov chain
least squares