Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms.
Vaneet AggarwalWashim Uddin MondalQinbo BaiPublished in: Found. Trends Optim. (2024)
Keyphrases
- least squares
- model free
- policy iteration
- average reward
- reinforcement learning
- policy evaluation
- rl algorithms
- function approximation
- reinforcement learning algorithms
- temporal difference
- markov decision processes
- stochastic games
- actor critic
- learning algorithm
- optimality criterion
- hierarchical reinforcement learning
- policy gradient reinforcement learning
- state action
- partially observable
- optimal policy
- policy gradient
- state space
- multi agent