Login / Signup
A policy improvement method for constrained average Markov decision processes.
Hyeong Soo Chang
Published in:
Oper. Res. Lett. (2007)
Keyphrases
</>
markov decision processes
dynamic programming
optimal policy
policy iteration
state space
reinforcement learning
real valued
partially observable
decision processes
average reward