ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs.
Ted MoskovitzBrendan O'DonoghueVivek VeeriahSebastian FlennerhagSatinder SinghTom ZahavyPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- markov decision processes
- state space
- stochastic approximation
- optimal policy
- function approximation
- policy iteration
- reinforcement learning algorithms
- markov decision process
- state and action spaces
- policy search
- learning process
- action space
- multi agent
- partially observable
- continuous state and action spaces
- model based reinforcement learning
- stochastic shortest path
- control problems
- dynamic programming
- model free
- action sets
- markov decision problems
- temporal difference learning
- average reward
- reward function
- action selection
- average cost
- policy evaluation
- approximate dynamic programming
- state abstraction
- convergence rate
- learning algorithm
- function approximators
- infinite horizon
- finite state
- decision theoretic planning
- convergence speed
- iterative algorithms