An Improved Soft Q Imitation Learning based on Normalized Reward.

Xiangren Kong Gang Feng

Published in: RICAI (2022)

Keyphrases

imitation learning
reinforcement learning
maximum margin
humanoid robot
robotic systems
function approximation
reinforcement learning methods
state space
markov decision processes
support vector machine
reinforcement learning algorithms
average reward
background knowledge
concept learning
optimal policy
temporal difference
reward function
control problems