Trajectory-Based Off-Policy Deep Reinforcement Learning.
Andreas DoerrMichael VolppMarc ToussaintSebastian TrimpeChristian DanielPublished in: CoRR (2019)
Keyphrases
- reinforcement learning
- function approximation
- state space
- temporal difference
- reinforcement learning algorithms
- model free
- dynamic programming
- markov decision processes
- trajectory data
- optimal policy
- robotic control
- multi agent
- multi agent reinforcement learning
- learning agents
- robot control
- direct policy search
- action selection
- partially observable
- machine learning
- deep learning
- optimal control
- temporal difference learning
- stochastic approximation
- learning problems
- moving object trajectories
- learning process
- learning algorithm