Near Optimal Reward-Free Reinforcement Learning.
Zihan ZhangSimon S. DuXiangyang JiPublished in: ICML (2021)
Keyphrases
- reinforcement learning
- function approximation
- state space
- reinforcement learning algorithms
- optimal policy
- reward function
- eligibility traces
- temporal difference
- machine learning
- temporal difference learning
- state action
- learning problems
- total reward
- markov decision processes
- neural network
- policy search
- robot control
- average reward
- action space
- agent receives
- policy gradient
- markov decision problems
- learning algorithm
- multi agent
- optimal control
- markov decision process
- partially observable
- learning process
- action selection
- state variables