Login / Signup
Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints.
Qinbo Bai
Vaneet Aggarwal
Ather Gattami
Published in:
CoRR (2020)
Keyphrases
</>
model free
long term
reinforcement learning
policy iteration
learning algorithm
dynamic programming
worst case
objective function
optimal solution
markov decision processes
neural network
reinforcement learning algorithms
e learning
monte carlo
average reward
confidence bounds