Dynamic Regret of Adversarial MDPs with Unknown Transition and Linear Function Approximation.
Long-Fei LiPeng ZhaoZhi-Hua ZhouPublished in: AAAI (2024)
Keyphrases
- function approximation
- reinforcement learning
- temporal difference learning algorithms
- function approximators
- temporal difference
- radial basis function
- temporal difference learning
- markov decision processes
- learning tasks
- reinforcement learning problems
- model free
- multi agent
- markov decision process
- optimal policy
- state space
- reinforcement learning algorithms
- learning algorithm
- policy evaluation
- machine learning
- supervised learning
- policy search
- reinforcement learning methods
- data mining
- monte carlo
- image classification