Correcting discount-factor mismatch in on-policy policy gradient methods.
Fengdi CheGautham VasanA. Rupam MahmoodPublished in: ICML (2023)
Keyphrases
- policy gradient methods
- discount factor
- policy gradient
- natural actor critic
- optimal policy
- markov decision problems
- markov decision processes
- average reward
- reinforcement learning problems
- learning rate
- actor critic
- reinforcement learning
- robot arm
- partially observable
- state space
- infinite horizon
- long run
- policy iteration
- convergence rate
- dynamic programming
- function approximation
- function approximators
- linear programming
- finite state
- decision problems
- gradient method
- temporal difference
- computational complexity
- approximation methods
- partially observable markov decision processes
- sufficient conditions
- multi agent