Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning.
Mingqi YuanBo LiXin JinWenjun ZengPublished in: CoRR (2023)
Keyphrases
- reward shaping
- reinforcement learning
- reinforcement learning algorithms
- complex domains
- state space
- action selection
- function approximation
- markov decision problems
- model free
- partially observable
- temporal difference
- markov decision processes
- policy search
- machine learning
- multi agent
- learning algorithm
- monte carlo
- optimal policy
- random walk
- sufficient conditions
- supervised learning
- markov decision process
- dynamic programming