Minimax-optimal reward-agnostic exploration in reinforcement learning.
Gen LiYuling YanYuxin ChenJianqing FanPublished in: COLT (2024)
Keyphrases
- reinforcement learning
- exploration strategy
- optimal control
- dynamic programming
- worst case
- average reward
- markov decision processes
- total reward
- action selection
- approximate dynamic programming
- multi armed bandit
- optimal policy
- function approximation
- control policy
- reinforcement learning algorithms
- learning algorithm
- long run
- state space
- optimal solution
- reward function
- learning process
- multi agent
- machine learning
- active exploration