A variational formula for risk-sensitive reward.
Venkatachalam AnantharamVivek Shripad BorkarPublished in: CoRR (2015)
Keyphrases
- risk sensitive
- optimal control
- markov decision processes
- reinforcement learning
- utility function
- model free
- average reward
- reward function
- decision theoretic
- long run
- optimality criterion
- average cost
- decision problems
- control policies
- markov decision chains
- real time
- optimal policy
- temporal difference
- finite horizon
- infinite horizon
- markov decision problems
- efficient optimization
- decision makers
- function approximation