Model-Free Algorithm and Regret Analysis for MDPs with Peak Constraints.

Qinbo Bai Ather Gattami Vaneet Aggarwal

Published in: CoRR (2020)

Keyphrases

model free
reinforcement learning
dynamic programming
learning algorithm
policy iteration
search space
monte carlo
markov decision processes
feature selection
worst case
convergence rate
function approximation
temporal difference
policy evaluation