Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation.
Orin LevyAlon CohenAsaf B. CasselYishay MansourPublished in: CoRR (2023)
Keyphrases
- function approximation
- reinforcement learning
- online learning
- markov decision processes
- dynamic programming
- temporal difference learning
- temporal difference learning algorithms
- temporal difference
- learning tasks
- image classification
- reward function
- optimal control
- reinforcement learning algorithms
- function approximators
- average reward
- state space
- multi agent
- optimal policy
- model free