Meta-Gradient Reinforcement Learning with an Objective Discovered Online.
Zhongwen XuHado Philip van HasseltMatteo HesselJunhyuk OhSatinder SinghDavid SilverPublished in: NeurIPS (2020)
Keyphrases
- reinforcement learning
- online learning
- function approximation
- policy gradient
- meta level
- real time
- state space
- balancing exploration and exploitation
- temporal difference
- learning process
- domain knowledge
- multi agent systems
- optimal policy
- markov decision processes
- learning algorithm
- genetic algorithm
- machine learning
- temporal difference learning
- neural network
- data sets