Provable Reset-free Reinforcement Learning by No-Regret Reduction.

Hoai-An Nguyen Ching-An Cheng

Published in: CoRR (2023)

Keyphrases

reinforcement learning
online learning
function approximation
total reward
lower bound
loss function
state space
learning algorithm
reward function
optimal policy
model free
reinforcement learning algorithms
reduction method
markov decision processes
learning problems
action selection
multi agent
function approximators
expert advice
multi armed bandit
confidence bounds
robotic control
machine learning
bandit problems
learning classifier systems
support vector
dynamic programming