Achieving Sub-linear Regret in Infinite Horizon Average Reward Constrained MDP with Linear Function Approximation.
Arnob GhoshXingyu ZhouNess B. ShroffPublished in: ICLR (2023)
Keyphrases
- function approximation
- markov decision processes
- infinite horizon
- optimal policy
- average reward
- reinforcement learning
- total reward
- policy iteration
- long run
- model free
- reinforcement learning algorithms
- finite horizon
- markov decision process
- dynamic programming
- state space
- temporal difference
- neural network
- function approximators
- average cost
- reward function
- td learning
- action selection
- finite state
- actor critic
- learning tasks
- markov decision problems
- discount factor
- discounted reward
- partially observable markov decision processes
- optimal control
- fixed point
- supervised learning
- control system
- learning algorithm