Controlling Underestimation Bias in Reinforcement Learning via Quasi-median Operation.
Wei WeiYujia ZhangJiye LiangLin LiYyuze LiPublished in: AAAI (2022)
Keyphrases
- reinforcement learning
- function approximation
- optimal policy
- reinforcement learning algorithms
- temporal difference
- supervised learning
- state space
- model free
- multi agent
- learning process
- control system
- neural network
- decision trees
- transfer learning
- markov decision processes
- learning problems
- evaluation function
- learning algorithm
- information retrieval
- machine learning
- real world