Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints.

Qinbo Bai Vaneet Aggarwal Ather Gattami

Published in: CoRR (2020)

Keyphrases

model free
long term
reinforcement learning
policy iteration
learning algorithm
dynamic programming
worst case
objective function
optimal solution
markov decision processes
neural network
reinforcement learning algorithms
e learning
monte carlo
average reward
confidence bounds