Login / Signup
f-Policy Gradients: A General Framework for Goal Conditioned RL using f-Divergences.
Siddhant Agarwal
Ishan Durugkar
Peter Stone
Amy Zhang
Published in:
CoRR (2023)
Keyphrases
</>
optimal policy
reinforcement learning
action selection
agent learns
multi agent
markov decision process
action space
control policy
infinite horizon
policy iteration
machine learning
markov decision processes
learning agents
policy search
partially observable domains