Efficient Exploration in Average-Reward Constrained Reinforcement Learning: Achieving Near-Optimal Regret With Posterior Sampling.
Danil ProvodinMaurits KapteinMykola PechenizkiyPublished in: CoRR (2024)
Keyphrases
- average reward
- reinforcement learning
- total reward
- optimal policy
- markov decision processes
- reward function
- model free
- reinforcement learning algorithms
- long run
- semi markov decision processes
- policy iteration
- stochastic games
- optimality criterion
- discounted reward
- function approximation
- state space
- state action
- state and action spaces
- actor critic
- probability distribution
- hierarchical reinforcement learning
- temporal difference
- policy gradient
- machine learning
- posterior distribution
- dynamic programming
- rl algorithms
- partially observable markov decision processes
- lower bound
- multi agent
- markov chain
- monte carlo
- learning algorithm
- partially observable
- active learning
- markov decision problems
- action selection
- finite state
- convergence rate
- learning problems
- decision problems
- least squares
- dynamical systems
- transfer learning