Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning.
Gen LiYuling YanYuxin ChenJianqing FanPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- dynamic programming
- average reward
- optimal control
- exploration strategy
- worst case
- action selection
- state space
- total reward
- control policy
- function approximation
- markov decision processes
- model free
- multi agent
- active exploration
- eligibility traces
- reinforcement learning algorithms
- temporal difference
- optimal solution
- approximate dynamic programming
- initially unknown
- reward function
- partially observable environments