No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions.
Tiancheng JinJunyan LiuChloé RouyerWilliam ChangChen-Yu WeiHaipeng LuoPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- online learning
- multi agent
- online algorithms
- function approximation
- state space
- optimal policy
- balancing exploration and exploitation
- temporal difference
- real time
- dynamic programming
- learning algorithm
- mobile robot
- optimal control
- binary classification
- learning process
- regret bounds
- regret minimization
- online convex optimization
- neural network