No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions.

Tiancheng Jin Junyan Liu Chloé Rouyer William Chang Chen-Yu Wei Haipeng Luo

Published in: CoRR (2023)

Keyphrases

reinforcement learning
online learning
multi agent
online algorithms
function approximation
state space
optimal policy
balancing exploration and exploitation
temporal difference
real time
dynamic programming
learning algorithm
mobile robot
optimal control
binary classification
learning process
regret bounds
regret minimization
online convex optimization
neural network