Provable Reset-free Reinforcement Learning by No-Regret Reduction.
Hoai-An NguyenChing-An ChengPublished in: ICML (2023)
Keyphrases
- reinforcement learning
- reward function
- online learning
- lower bound
- total reward
- state space
- reinforcement learning algorithms
- worst case
- multi agent
- function approximation
- binary classification
- model free
- temporal difference
- reduction method
- confidence bounds
- regret minimization
- machine learning
- partially observable
- learning algorithm
- regret bounds
- expert advice
- multi agent reinforcement learning
- multi armed bandit
- markov decision processes
- least squares