Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning.
Jacob AdamczykArgenis ArriojasStas TiomkinRahul V. KulkarniPublished in: AAAI (2023)
Keyphrases
- reward shaping
- reinforcement learning
- complex domains
- reinforcement learning algorithms
- state space
- machine learning
- model free
- function approximation
- optimal solution
- markov decision problems
- neural network
- markov decision processes
- pareto optimal
- domain knowledge
- temporal difference
- reward function
- learning process
- partially observable
- markov decision process