Striving for Simplicity in Off-policy Deep Reinforcement Learning.

Rishabh Agarwal Dale Schuurmans Mohammad Norouzi

Published in: CoRR (2019)

Keyphrases

reinforcement learning
function approximation
temporal difference
markov decision processes
learning algorithm
model free
real time
deep learning
learning process
state space
optimal policy
reinforcement learning methods
reinforcement learning algorithms
action selection
database
supervised learning
multi agent
monte carlo
probabilistic model
learning problems
dynamic programming
learning classifier systems
search space
case study
social networks
machine learning
temporal difference learning
neural network
stochastic approximation
robotic control