Provable Reset-free Reinforcement Learning by No-Regret Reduction.
Hoai-An NguyenChing-An ChengPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- online learning
- function approximation
- total reward
- lower bound
- loss function
- state space
- learning algorithm
- reward function
- optimal policy
- model free
- reinforcement learning algorithms
- reduction method
- markov decision processes
- learning problems
- action selection
- multi agent
- function approximators
- expert advice
- multi armed bandit
- confidence bounds
- robotic control
- machine learning
- bandit problems
- learning classifier systems
- support vector
- dynamic programming