Login / Signup
A basic formula for performance gradient estimation of semi-Markov decision processes.
Yanjie Li
Fang Cao
Published in:
Eur. J. Oper. Res. (2013)
Keyphrases
</>
gradient estimation
semi markov decision processes
markov decision processes
average reward
variance reduction
optimal policy
long run
learning algorithm
training data
parameter estimation