Offline Meta-Reinforcement Learning with Advantage Weighting.
Eric MitchellRafael RafailovXue Bin PengSergey LevineChelsea FinnPublished in: ICML (2021)
Keyphrases
- reinforcement learning
- function approximation
- state space
- real time
- dynamic programming
- feature weighting
- weighting scheme
- learning algorithm
- optimal policy
- meta level
- optimal control
- markov decision processes
- learning process
- learning problems
- multi agent
- model free
- temporal difference
- database
- search space
- similarity measure
- information retrieval
- machine learning
- data mining
- action space
- temporal difference learning
- policy gradient
- policy search