Dynamic Regret of Adversarial MDPs with Unknown Transition and Linear Function Approximation.

Long-Fei Li Peng Zhao Zhi-Hua Zhou

Published in: AAAI (2024)

Keyphrases

function approximation
reinforcement learning
temporal difference learning algorithms
function approximators
temporal difference
radial basis function
temporal difference learning
markov decision processes
learning tasks
reinforcement learning problems
model free
multi agent
markov decision process
optimal policy
state space
reinforcement learning algorithms
learning algorithm
policy evaluation
machine learning
supervised learning
policy search
reinforcement learning methods
data mining
monte carlo
image classification