Deep Q-learning From Demonstrations.

Todd Hester Matej Vecerík Olivier Pietquin Marc Lanctot Tom Schaul Bilal Piot Dan Horgan John Quan Andrew Sendonaris Ian Osband Gabriel Dulac-Arnold John P. Agapiou Joel Z. Leibo Audrunas Gruslys

Published in: AAAI (2018)

Keyphrases

reinforcement learning
cooperative
function approximation
multi agent
state space
stochastic approximation
learning algorithm
optimal policy
reinforcement learning algorithms
model free
multiagent learning
decision trees
learning rate
neural network
markov chain
knowledge base
temporal difference learning
continuous state spaces
bucket brigade
credit assignment