Correcting discount-factor mismatch in on-policy policy gradient methods.
Fengdi CheGautham VasanA. Rupam MahmoodPublished in: CoRR (2023)
Keyphrases
- policy gradient methods
- discount factor
- policy gradient
- optimal policy
- natural actor critic
- average reward
- markov decision processes
- markov decision problems
- learning rate
- reinforcement learning problems
- actor critic
- infinite horizon
- reinforcement learning
- partially observable
- state space
- decision problems
- linear programming
- average cost
- robot arm
- long run
- finite state
- machine learning
- policy iteration
- function approximation
- neural network
- multi agent systems