Off-Policy Training for Truncated TD(λ) Boosted Soft Actor-Critic.
Shiyu HuangBin WangHang SuDong LiJianye HaoJun ZhuTing ChenPublished in: PRICAI (3) (2021)
Keyphrases
- temporal difference
- actor critic
- reinforcement learning
- reinforcement learning algorithms
- function approximation
- evaluation function
- monte carlo
- policy gradient
- supervised learning
- optimal control
- learning algorithm
- linear programming
- neural network
- markov decision processes
- state space
- search space
- training set
- optimal solution