Finite Horizon Q-learning: Stability, Convergence and Simulations.
Vivek VPShalabh BhatnagarPublished in: CoRR (2021)
Keyphrases
- finite horizon
- optimal policy
- markov decision processes
- stochastic shortest path
- infinite horizon
- stochastic approximation
- state space
- reinforcement learning
- optimal stopping
- single product
- inventory models
- dynamic programming
- markov decision process
- inventory control
- decision problems
- multistage
- long run
- multi agent
- average cost
- sufficient conditions
- finite state
- yield management
- learning algorithm
- control policies
- lost sales
- reward function
- lot size
- state dependent
- expected reward
- search algorithm
- initial state
- partially observable
- probability distribution