An Improved Soft Q Imitation Learning based on Normalized Reward.
Xiangren KongGang FengPublished in: RICAI (2022)
Keyphrases
- imitation learning
- reinforcement learning
- maximum margin
- humanoid robot
- robotic systems
- function approximation
- reinforcement learning methods
- state space
- markov decision processes
- support vector machine
- reinforcement learning algorithms
- average reward
- background knowledge
- concept learning
- optimal policy
- temporal difference
- reward function
- control problems