A Reduction from Reinforcement Learning to No-Regret Online Learning.
Ching-An ChengRemi Tachet des CombesByron BootsGeoffrey J. GordonPublished in: AISTATS (2020)
Keyphrases
- online learning
- reinforcement learning
- e learning
- computer mediated
- function approximation
- higher education
- distance education
- online course
- distance learning
- online convex optimization
- blended learning
- markov decision processes
- active learning
- online learning environments
- temporal difference
- reinforcement learning algorithms
- learning process
- multi agent
- state space
- regret bounds
- model free
- machine learning
- learning algorithm
- optimal control
- learning problems
- dynamic programming
- neural network
- lower bound
- optimal policy
- supervised learning
- data sets