Off-Policy Average Reward Actor-Critic with Deterministic Policy Search.
Naman SaxenaSubhojyoti KhastagirShishir KolathayaShalabh BhatnagarPublished in: ICML (2023)
Keyphrases
- policy gradient
- actor critic
- average reward
- markov decision processes
- reinforcement learning
- optimal policy
- long run
- partially observable markov decision processes
- stochastic games
- state action
- model free
- optimal control
- reinforcement learning algorithms
- policy iteration
- gradient method
- markov chain
- function approximation
- multi agent
- variance reduction
- dynamic programming
- state space
- reward function
- partially observable
- single agent
- learning algorithm
- approximation methods
- domain independent
- neural network
- decision problems
- finite state