Login / Signup
Learning Stochastic Optimal Policies via Gradient Descent.
Stefano Massaroli
Michael Poli
Stefano Peluchetti
Jinkyoo Park
Atsushi Yamashita
Hajime Asama
Published in:
CoRR (2021)
Keyphrases
</>
optimal policy
reinforcement learning
markov decision processes
decision problems
average reward reinforcement learning
learning algorithm
dynamic programming
state space
search algorithm
cost function
markov chain
finite state
state dependent
dynamic programming algorithms