Can Q-Learning be Improved with Advice?

Noah Golowich Ankur Moitra

Published in: CoRR (2021)

Keyphrases

reinforcement learning
multiscale
learning algorithm
multi agent
function approximation
database
cooperative
state space
stochastic approximation
data sets
e learning
sufficient conditions
optimal policy
temporal difference learning