Nearly Minimax Optimal Reward-free Reinforcement Learning.
Zihan ZhangSimon S. DuXiangyang JiPublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- optimal control
- worst case
- total reward
- average reward
- control policy
- dynamic programming
- state space
- reinforcement learning algorithms
- machine learning
- model free
- function approximation
- optimal solution
- approximate dynamic programming
- average cost
- learning algorithm
- optimal policy
- long run
- markov decision processes
- learning problems
- action selection
- learning capabilities
- multi agent
- initially unknown
- multi armed bandit