Finite-Time Convergence and Sample Complexity of Multi-Agent Actor-Critic Reinforcement Learning with Average Reward.
HairiJia LiuSongtao LuPublished in: ICLR (2022)
Keyphrases
- actor critic
- reinforcement learning
- average reward
- multi agent
- learning problems
- markov decision processes
- optimal policy
- policy gradient
- policy iteration
- temporal difference
- reinforcement learning algorithms
- model free
- supervised learning
- function approximation
- learning algorithm
- stochastic games
- state space
- rl algorithms
- approximate dynamic programming
- state action
- convergence rate
- long run
- optimal control
- upper bound
- active learning
- dynamic programming
- single agent
- finite number
- multi agent systems
- markov chain
- partially observable markov decision processes
- reinforcement learning methods
- markov decision problems
- learning process
- convergence speed
- monte carlo
- reward function
- action selection
- linear programming
- least squares
- decision problems
- gradient method
- neuro fuzzy
- finite state
- search space