Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms.
Yashaswini MurthyMehrdad MoharramiR. SrikantPublished in: NeurIPS (2023)
Keyphrases
- reinforcement learning algorithms
- average reward
- markov decision processes
- discounted reward
- reinforcement learning
- model free
- optimal policy
- reinforcement learning problems
- policy iteration
- total reward
- reward function
- state space
- policy gradient
- stochastic games
- reinforcement learning methods
- finite state
- actor critic
- state and action spaces
- state action
- markov decision process
- dynamic programming
- temporal difference
- function approximation
- average cost
- partially observable
- infinite horizon
- action space
- decision processes
- long run
- data mining
- markov decision problems
- stationary policies
- td learning
- fixed point
- decision problems
- reward shaping