Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies.
David BalduzziMuhammad GhifaryPublished in: CoRR (2015)
Keyphrases
- reinforcement learning
- optimal policy
- fitted q iteration
- policy search
- action space
- control policies
- markov decision process
- state space
- markov decision processes
- control policy
- partially observable markov decision processes
- function approximation
- markov decision problems
- decision problems
- dynamic programming
- continuous state spaces
- reward function
- infinite horizon
- reinforcement learning algorithms
- transfer learning
- learning process
- model free
- deep learning
- hierarchical reinforcement learning
- learning algorithm
- reinforcement learning agents
- learning problems
- multiagent reinforcement learning
- partially observable
- tabula rasa
- real robot
- control problems
- continuous domains
- long run
- optimal control
- bayesian networks