Near Optimal Reward-Free Reinforcement Learning.

Zihan Zhang Simon S. Du Xiangyang Ji

Published in: ICML (2021)

Keyphrases

reinforcement learning
function approximation
state space
reinforcement learning algorithms
optimal policy
reward function
eligibility traces
temporal difference
machine learning
temporal difference learning
state action
learning problems
total reward
markov decision processes
neural network
policy search
robot control
average reward
action space
agent receives
policy gradient
markov decision problems
learning algorithm
multi agent
optimal control
markov decision process
partially observable
learning process
action selection
state variables