Exploiting Reward Shifting in Value-Based Deep RL.
Hao SunLei HanRui YangXiaoteng MaJian GuoBolei ZhouPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- function approximation
- policy gradient
- state space
- optimal policy
- machine learning
- reward function
- model free
- markov decision processes
- multi agent
- temporal difference
- rl algorithms
- control policy
- total reward
- eligibility traces
- long run
- neural network
- learning algorithm
- complex domains
- learning agent
- average reward
- deep learning
- reinforcement learning methods
- transfer learning
- action selection
- learning classifier systems
- active learning