Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation.
Orin LevyAlon CohenAsaf B. CasselYishay MansourPublished in: ICML (2023)
Keyphrases
- function approximation
- reinforcement learning
- online learning
- dynamic programming
- temporal difference
- markov decision processes
- temporal difference learning
- state space
- function approximators
- radial basis function
- temporal difference learning algorithms
- learning tasks
- model free
- support vector
- policy evaluation
- reinforcement learning problems
- average cost
- least squares
- multi agent