Fast Rates for the Regret of Offline Reinforcement Learning.
Yichun HuNathan KallusMasatoshi UeharaPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- reward function
- total reward
- reinforcement learning algorithms
- state space
- function approximation
- real time
- lower bound
- machine learning
- model free
- markov decision processes
- online learning
- temporal difference
- transfer learning
- worst case
- multi agent systems
- robotic control
- loss function
- regret minimization
- minimax regret
- weighted majority
- multi agent reinforcement learning
- expert advice
- optimal control
- reinforcement learning methods
- temporal difference learning
- partially observable
- pairwise
- learning process
- multi armed bandit
- action selection
- learning tasks