Towards Achieving Sub-linear Regret and Hard Constraint Violation in Model-free RL.
Arnob GhoshXingyu ZhouNess B. ShroffPublished in: AISTATS (2024)
Keyphrases
- model free
- reinforcement learning
- reinforcement learning algorithms
- function approximation
- linear constraints
- temporal difference
- rl algorithms
- policy iteration
- policy evaluation
- reinforcement learning methods
- reward function
- constraint violations
- impedance control
- average reward
- function approximators
- markov decision processes
- dynamic programming
- regret bounds
- artificial neural networks
- learning algorithm
- data mining