Login / Signup

A basic formula for performance gradient estimation of semi-Markov decision processes.

Yanjie LiFang Cao
Published in: Eur. J. Oper. Res. (2013)
Keyphrases
  • gradient estimation
  • semi markov decision processes
  • markov decision processes
  • average reward
  • variance reduction
  • optimal policy
  • long run
  • learning algorithm
  • training data
  • parameter estimation