Policy Gradient for s-Rectangular Robust Markov Decision Processes.
Navdeep KumarEsther DermanMatthieu GeistKfir LevyShie MannorPublished in: CoRR (2023)
Keyphrases
- markov decision processes
- reinforcement learning algorithms
- average reward
- policy gradient
- reinforcement learning
- optimal policy
- dynamic programming
- state space
- finite state
- policy iteration
- actor critic
- action space
- infinite horizon
- partially observable
- partially observable markov decision processes
- markov decision process
- model free
- long run
- function approximation
- stochastic games
- multi agent
- reward function
- convergence rate
- markov chain
- search algorithm