Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty.
Yanwei JiaPublished in: CoRR (2024)
Keyphrases
- risk sensitive
- optimal control
- reinforcement learning
- model free
- markov decision processes
- objective function
- state space
- dynamic programming
- markov decision chains
- infinite horizon
- control policies
- function approximation
- control strategy
- reinforcement learning algorithms
- optimal policy
- dynamical systems
- utility function
- average cost
- action space
- mathematical model
- supervised learning
- computational complexity
- markov decision problems
- learning algorithm
- action selection
- temporal difference
- multi objective
- policy iteration
- multi agent
- bayesian networks