Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping.
Hao SunLei HanRui YangXiaoteng MaJian GuoBolei ZhouPublished in: NeurIPS (2022)
Keyphrases
- reward shaping
- reinforcement learning
- exploration exploitation tradeoff
- reinforcement learning algorithms
- complex domains
- function approximation
- state space
- action selection
- reward function
- optimal policy
- learning algorithm
- learning agent
- function approximators
- markov decision processes
- model free
- temporal difference
- markov decision problems
- multi agent
- transfer learning
- partially observable
- objective function
- learning capabilities
- supervised learning
- average reward