Login / Signup
Learning Stochastic Optimal Policies via Gradient Descent.
Stefano Massaroli
Michael Poli
Stefano Peluchetti
Jinkyoo Park
Atsushi Yamashita
Hajime Asama
Published in:
IEEE Control. Syst. Lett. (2022)
Keyphrases
</>
optimal policy
reinforcement learning
learning algorithm
dynamic programming
markov decision processes
finite horizon
data mining
state space
multistage
learning rate
control policies
average reward reinforcement learning