Chaining Value Functions for Off-Policy Learning.
Simon SchmittJohn Shawe-TaylorHado van HasseltPublished in: AAAI (2022)
Keyphrases
- learning process
- learning algorithm
- learning problems
- learning systems
- online learning
- supervised learning
- concept learning
- prior knowledge
- reinforcement learning
- computer vision
- user interface
- multi agent systems
- learning environment
- multi class
- neural network
- collaborative learning
- empirical studies
- website
- background knowledge
- artificial intelligence
- learning community
- learning scheme
- positive examples
- computer programming