Off-policy Multi-step Q-learning.

Gabriel Kalweit Maria Hügle Joschka Boedecker

Published in: CoRR (2019)

Keyphrases

multi step
td learning
reinforcement learning
function approximation
cooperative
state space
multi agent
stochastic approximation
lower bounding
optimal policy
learning algorithm
learning rate
reinforcement learning algorithms
action selection
single step
knn
model free
semi supervised
tumor classification
objective function
policy iteration
supervised learning
training set