ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs.
Ted MoskovitzBrendan O'DonoghueVivek VeeriahSebastian FlennerhagSatinder SinghTom ZahavyPublished in: ICML (2023)
Keyphrases
- reinforcement learning
- markov decision processes
- state space
- stochastic approximation
- function approximation
- optimal policy
- policy iteration
- convergence speed
- reinforcement learning algorithms
- policy search
- policy evaluation
- multi agent
- control problems
- continuous state and action spaces
- partially observable
- model free
- continuous state
- temporal difference learning
- learning algorithm
- stochastic shortest path
- reward function
- temporal difference
- markov decision process
- model based reinforcement learning
- action space
- state and action spaces
- optimal control
- dynamic programming
- continuous state spaces
- average cost
- learning process
- least squares
- convergence rate
- theoretical justification
- probabilistic planning
- markov decision problems
- reinforcement learning methods
- average reward
- action selection
- step size
- stationary policies
- rl algorithms
- approximate dynamic programming
- planning under uncertainty
- transition model
- reinforcement learning problems
- action sets
- iterative algorithms
- machine learning
- factored markov decision processes