Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks.
Litian LiangYaosheng XuStephen McAleerDailin HuAlexander IhlerPieter AbbeelRoy FoxPublished in: CoRR (2022)
Keyphrases
- temporal difference
- td learning
- reinforcement learning
- evaluation function
- function approximation
- monte carlo
- model free
- step size
- temporal difference learning
- action selection
- neural network
- temporal difference methods
- actor critic
- reinforcement learning algorithms
- learning algorithm
- feature selection
- policy evaluation
- genetic algorithm
- importance sampling
- machine learning