Self-Supervised Imitation for Offline Reinforcement Learning With Hindsight Relabeling.
Xudong YuChenjia BaiChanghong WangDengxiu YuC. L. Philip ChenZhen WangPublished in: IEEE Trans. Syst. Man Cybern. Syst. (2023)
Keyphrases
- reinforcement learning
- action selection
- function approximation
- state space
- optimal policy
- machine learning
- multi agent
- learning process
- markov decision processes
- real time
- model free
- learning algorithm
- imitation learning
- reinforcement learning algorithms
- learning problems
- direct policy search
- temporal difference
- dynamic programming
- temporal difference learning
- reinforcement learning methods
- reward function
- control problems
- information systems
- partially observable environments
- neural network