Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic.
Wesley A. SuttleAmrit Singh BediBhrij PatelBrian M. SadlerAlec KoppelDinesh ManochaPublished in: CoRR (2023)
Keyphrases
- monte carlo
- actor critic
- average reward reinforcement learning
- temporal difference
- reinforcement learning
- optimal policy
- reinforcement learning algorithms
- markov chain
- policy iteration
- policy gradient
- average reward
- optimal control
- function approximation
- importance sampling
- markov decision processes
- temporal difference learning
- monte carlo tree search
- neuro fuzzy
- approximate dynamic programming
- particle filter
- state space
- dynamic programming
- variance reduction
- machine learning
- multi agent
- model free
- decision problems
- function approximators
- feature space
- learning problems