Pretrain Soft Q-Learning with Imperfect Demonstrations.

Xiaoqin Zhang Yunfei Li Huimin Ma Xiong Luo

Published in: CoRR (2019)

Keyphrases

reinforcement learning
cooperative
learning algorithm
multi agent
state space
stochastic approximation
function approximation
learning rate
model free
optimal policy
temporal difference learning
action selection
reinforcement learning algorithms
multi agent reinforcement learning
real time
relational reinforcement learning
potential field
partial information
information systems
hierarchical reinforcement learning
machine learning