The Primacy Bias in Deep Reinforcement Learning.

Evgenii Nikishin Max Schwarzer Pierluca D'Oro Pierre-Luc Bacon Aaron C. Courville

Published in: CoRR (2022)

Keyphrases

reinforcement learning
function approximation
reinforcement learning algorithms
model free
temporal difference
direct policy search
learning algorithm
optimal policy
markov decision processes
dynamic programming
state space
reward function
temporal difference learning
learning capabilities
robotic control
multi agent reinforcement learning
stochastic approximation
reinforcement learning methods
deep learning
markov decision process
robot control
artificial neural networks
neural network
trade off
unsupervised learning
supervised learning