Policy invariant explicit shaping: an efficient alternative to reward shaping.
Paniz BehboudianYash SatsangiMatthew E. TaylorAnna HarutyunyanMichael BowlingPublished in: Neural Comput. Appl. (2022)
Keyphrases
- reward shaping
- markov decision problems
- reinforcement learning
- complex domains
- optimal policy
- reinforcement learning algorithms
- reward function
- state space
- markov decision process
- linear programming
- decision processes
- policy search
- markov decision processes
- agent learns
- decision theoretic
- transition probabilities
- infinite horizon
- random walk
- partially observable
- policy iteration
- least squares
- expected utility
- transition model
- action space
- policy gradient
- temporal difference
- neural network
- action selection
- dynamic programming
- domain knowledge
- multi agent
- learning algorithm