A Long N-step Surrogate Stage Reward for Deep Reinforcement Learning.
Junmin ZhongRuofan WuJennie SiPublished in: NeurIPS (2023)
Keyphrases
- reinforcement learning
- function approximation
- multi agent
- state space
- average reward
- model free
- markov decision processes
- partially observable
- post processing
- learning process
- partially observable environments
- transfer learning
- learning problems
- total reward
- learning stage
- machine learning
- preprocessing stage
- reinforcement learning algorithms
- policy gradient
- solving problems
- temporal difference learning
- robotic control
- learning agent
- learning capabilities
- temporal difference
- action selection
- optimal control
- supervised learning
- dynamic programming
- decision making
- search engine