Login / Signup

A policy improvement method for constrained average Markov decision processes.

Hyeong Soo Chang
Published in: Oper. Res. Lett. (2007)
Keyphrases
  • markov decision processes
  • dynamic programming
  • optimal policy
  • policy iteration
  • state space
  • reinforcement learning
  • real valued
  • partially observable
  • decision processes
  • average reward