Pretrain Soft Q-Learning with Imperfect Demonstrations.
Xiaoqin ZhangYunfei LiHuimin MaXiong LuoPublished in: CoRR (2019)
Keyphrases
- reinforcement learning
- cooperative
- learning algorithm
- multi agent
- state space
- stochastic approximation
- function approximation
- learning rate
- model free
- optimal policy
- temporal difference learning
- action selection
- reinforcement learning algorithms
- multi agent reinforcement learning
- real time
- relational reinforcement learning
- potential field
- partial information
- information systems
- hierarchical reinforcement learning
- machine learning