Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic.
Bhrij PatelWesley A. SuttleAlec KoppelVaneet AggarwalBrian M. SadlerAmrit Singh BediDinesh ManochaPublished in: CoRR (2024)
Keyphrases
- actor critic
- average reward
- markov decision processes
- long run
- reinforcement learning
- policy iteration
- optimal policy
- policy gradient
- rl algorithms
- model free
- stochastic games
- global optimization
- markov chain
- approximate dynamic programming
- active learning
- reinforcement learning algorithms
- state space
- temporal difference
- gradient method
- optimal control
- least squares
- average cost
- reward function
- state action
- finite state
- neuro fuzzy
- decision problems
- optimal solution