A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs.
Andrea TirinzoniMatteo PirottaAlessandro LazaricPublished in: CoRR (2021)
Keyphrases
- finite horizon
- lower bound
- markov decision processes
- optimal policy
- infinite horizon
- upper bound
- optimal stopping
- objective function
- multistage
- inventory control
- inventory models
- single product
- average cost
- worst case
- markov decision process
- np hard
- lower and upper bounds
- optimal solution
- finite state
- upper and lower bounds
- lot size
- control policies
- expected reward
- long run
- reward function
- decision problems
- state space
- reinforcement learning
- online algorithms
- search algorithm
- finite number
- non stationary
- dynamic programming