Reinforcement learning from suboptimal demonstrations based on Reward Relabeling.
Yong PengJunjie ZengYue HuQi FangQuanjun YinPublished in: Expert Syst. Appl. (2024)
Keyphrases
- reinforcement learning
- function approximation
- state space
- eligibility traces
- reinforcement learning algorithms
- dynamic programming
- multi agent
- learning algorithm
- total reward
- reinforcement learning methods
- markov decision processes
- reward function
- partially observable environments
- learning agent
- temporal difference
- model free
- optimal control
- transfer learning
- markov decision process
- optimal policy
- computationally efficient
- supervised learning
- mobile robot
- reward shaping
- learning process
- action selection
- learning problems
- control policy
- policy gradient
- transition model