Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies.

Zihan Zhang Xiangyang Ji Simon S. Du

Published in: CoRR (2022)

Keyphrases

reinforcement learning
action sets
markov decision processes
stationary policies
state space
markov decision process
special case
optimal policy
function approximation
learning algorithm
dynamic programming
multi agent
finite state
temporal difference
reward function
partially observable
model free
function approximators
learning tasks
lower bound