Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic.
Wesley A. SuttleAmrit S. BediBhrij PatelBrian M. SadlerAlec KoppelDinesh ManochaPublished in: ICML (2023)
Keyphrases
- monte carlo
- average reward reinforcement learning
- actor critic
- temporal difference
- reinforcement learning
- optimal policy
- policy iteration
- policy gradient
- optimal control
- markov chain
- reinforcement learning algorithms
- importance sampling
- neuro fuzzy
- approximate dynamic programming
- temporal difference learning
- function approximation
- variance reduction
- state space
- monte carlo tree search
- gradient method
- learning algorithm
- control problems
- particle filter
- adaptive control
- action selection
- model free
- kalman filter
- dynamic programming
- bayesian networks