Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms.
Yashaswini MurthyMehrdad MoharramiR. SrikantPublished in: CoRR (2023)
Keyphrases
- reinforcement learning algorithms
- average reward
- markov decision processes
- discounted reward
- model free
- optimal policy
- reinforcement learning
- reward function
- reinforcement learning problems
- policy iteration
- state space
- total reward
- policy gradient
- actor critic
- stochastic games
- finite state
- state action
- state and action spaces
- reinforcement learning methods
- dynamic programming
- infinite horizon
- average cost
- decision processes
- markov decision process
- action space
- function approximation
- partially observable
- policy evaluation
- temporal difference
- long run
- generative model
- markov chain
- stationary policies
- fixed point
- td learning
- multi agent