Sign in

Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management.

Shipra AgrawalRandy Jia
Published in: Oper. Res. (2022)
Keyphrases
  • reinforcement learning
  • inventory management
  • learning algorithm
  • cost function
  • online learning
  • supply chain
  • markov decision processes
  • machine learning
  • training data
  • dynamic programming
  • online convex optimization