Near Optimal Policy Optimization via REPS.
Aldo PacchianoJonathan N. LeePeter L. BartlettOfir NachumPublished in: NeurIPS (2021)
Keyphrases
- optimal policy
- decision problems
- markov decision processes
- finite horizon
- reinforcement learning
- state space
- dynamic programming
- infinite horizon
- long run
- state dependent
- bayesian reinforcement learning
- multistage
- finite state
- sufficient conditions
- markov decision process
- average cost
- average reward
- serial inventory systems
- stochastic optimization
- optimal strategy
- policy iteration
- inventory level
- reward function
- lost sales
- control policies
- develop a mathematical model
- learning algorithm