Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function.
Clément BonnetLaurence MidgleyAlexandre LaterrePublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- learning process
- learning algorithm
- learning systems
- supervised learning
- policy gradient
- state action
- function approximators
- temporal difference
- learning problems
- learning tasks
- state space
- mobile learning
- markov decision processes
- knowledge acquisition
- online learning
- learning analytics
- optimal control
- learning mechanism
- active learning
- multi agent
- machine learning