Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management.

Shipra Agrawal Randy Jia

Published in: CoRR (2019)

Keyphrases

inventory management
cost function
reinforcement learning
learning algorithm
online learning
markov decision processes
convex optimization
linear programming
maximum likelihood
linear predictors