Online learning in MDPs with linear function approximation and bandit feedback.

Gergely Neu Julia Olkhovskaya

Published in: NeurIPS (2021)

Keyphrases

function approximation
online learning
reinforcement learning
temporal difference learning algorithms
function approximators
regret bounds
markov decision processes
temporal difference
state space
model free
policy evaluation
temporal difference learning
markov decision problems
learning tasks
radial basis function
e learning
reinforcement learning problems
policy iteration
dynamic programming
optimal policy
multi agent
policy search
machine learning
data mining
linear programming
supervised learning
training data
learning algorithm