Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning.
Yunfei LiTian GaoJiaqi YangHuazhe XuYi WuPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- state space
- function approximation
- reinforcement learning algorithms
- high dimensional
- reward function
- markov decision processes
- eligibility traces
- agent learns
- learning algorithm
- optimal policy
- model free
- temporal difference
- dynamic programming
- initially unknown
- machine learning
- sparse representation
- supervised learning
- sparse data
- markov decision process
- transfer learning
- sparse coding
- compressed sensing
- average reward