Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning.
Sabrina HoppeMarc ToussaintPublished in: CoRR (2020)
Keyphrases
- model free
- reinforcement learning
- reinforcement learning algorithms
- function approximation
- policy iteration
- temporal difference
- rl algorithms
- reinforcement learning methods
- state space
- learning algorithm
- temporal difference learning
- optimal policy
- partially observable
- policy evaluation
- markov decision processes
- continuous state and action spaces
- hierarchical reinforcement learning
- multi agent reinforcement learning
- machine learning
- artificial neural networks
- average reward
- transfer learning
- action selection
- state action
- policy gradient
- dynamic programming
- impedance control