Variance aware reward smoothing for deep reinforcement learning.
Yunlong DongShengjun ZhangXing LiuYu ZhangTan ShenPublished in: Neurocomputing (2021)
Keyphrases
- reinforcement learning
- function approximation
- eligibility traces
- state space
- reinforcement learning algorithms
- reward function
- optimal policy
- transfer learning
- temporal difference
- reinforcement learning methods
- average reward
- machine learning
- markov decision processes
- policy gradient
- smoothing methods
- learning algorithm
- model free
- standard deviation
- learning problems
- supervised learning
- multi agent
- covariance matrix
- partially observable
- smoothing algorithm
- total reward
- optimal control
- multiscale
- low variance
- image smoothing
- learning process
- probabilistic model
- variance reduction
- deep learning
- policy iteration