Offline Meta-Reinforcement Learning with Advantage Weighting.

Eric Mitchell Rafael Rafailov Xue Bin Peng Sergey Levine Chelsea Finn

Published in: ICML (2021)

Keyphrases

reinforcement learning
function approximation
state space
real time
dynamic programming
feature weighting
weighting scheme
learning algorithm
optimal policy
meta level
optimal control
markov decision processes
learning process
learning problems
multi agent
model free
temporal difference
database
search space
similarity measure
information retrieval
machine learning
data mining
action space
temporal difference learning
policy gradient
policy search