No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions.
Tiancheng JinJunyan LiuChloé RouyerWilliam ChangChen-Yu WeiHaipeng LuoPublished in: NeurIPS (2023)
Keyphrases
- reinforcement learning
- online learning
- online convex optimization
- multi agent
- markov decision processes
- state space
- reward function
- online algorithms
- balancing exploration and exploitation
- dynamic programming
- learning process
- real time
- lower bound
- machine learning
- reinforcement learning algorithms
- transfer learning
- confidence bounds
- regret bounds
- long run
- function approximation
- learning problems
- loss function
- learning algorithm
- data sets