Towards Achieving Sub-linear Regret and Hard Constraint Violation in Model-free RL.

Arnob Ghosh Xingyu Zhou Ness B. Shroff

Published in: AISTATS (2024)

Keyphrases

model free
reinforcement learning
reinforcement learning algorithms
function approximation
linear constraints
temporal difference
rl algorithms
policy iteration
policy evaluation
reinforcement learning methods
reward function
constraint violations
impedance control
average reward
function approximators
markov decision processes
dynamic programming
regret bounds
artificial neural networks
learning algorithm
data mining