Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation.

Jiafan He Dongruo Zhou Quanquan Gu

Published in: CoRR (2021)

Keyphrases

function approximation
reinforcement learning
learning tasks
function approximators
learning algorithm
temporal difference methods
learning process
supervised learning
markov decision processes
temporal difference learning algorithms
temporal difference learning
pattern recognition
policy evaluation
td learning
multi agent
machine learning
optimal policy
radial basis function
dynamic programming
optimal control
model free
learning agent
reinforcement learning problems