QXplore: Q-learning Exploration by Maximizing Temporal Difference Error.

Riley Simmons-Edler Ben Eisner Eric Mitchell H. Sebastian Seung Daniel D. Lee

Published in: CoRR (2019)

Keyphrases

temporal difference
action selection
td learning
reinforcement learning
function approximation
reinforcement learning algorithms
temporal difference learning
model free
temporal difference methods
policy iteration
evaluation function
policy evaluation
function approximators
monte carlo
reinforcement learning methods
actor critic
state space
td methods
step size
decision making
radial basis function
supervised learning
markov games
neural network