To the Max: Reinventing Reward in Reinforcement Learning.
Grigorii VeviurkoWendelin BöhmerMathijs de WeerdtPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- function approximation
- state space
- reinforcement learning algorithms
- model free
- machine learning
- reward function
- eligibility traces
- optimal policy
- markov decision processes
- learning algorithm
- learning problems
- markov decision process
- average reward
- transfer learning
- dynamic programming
- temporal difference
- policy search
- real robot
- control policy
- policy gradient
- robotic control
- reward shaping
- learning agents
- neural network
- decision making
- long run
- multi agent