Online finite-horizon optimal learning algorithm for nonzero-sum games with partially unknown dynamics and constrained inputs.
Xiaohong CuiHuaguang ZhangYanhong LuoPeifu ZuPublished in: Neurocomputing (2016)
Keyphrases
- finite horizon
- optimal stopping
- learning algorithm
- optimal policy
- infinite horizon
- markov decision processes
- single product
- inventory models
- average cost
- inventory control
- dynamic programming
- multistage
- reinforcement learning
- periodic review
- lot size
- optimal control
- markov decision process
- yield management
- online algorithms
- control policies
- stochastic demand
- optimal solution
- real time
- finite number
- single item
- inventory policy
- machine learning
- long run
- dynamical systems
- random variables
- sufficient conditions
- upper bound
- objective function
- ordering cost