Near Optimal Policy Optimization via REPS.
Aldo PacchianoJonathan N. LeePeter L. BartlettOfir NachumPublished in: CoRR (2021)
Keyphrases
- optimal policy
- markov decision processes
- decision problems
- state space
- infinite horizon
- dynamic programming
- finite horizon
- multistage
- finite state
- state dependent
- long run
- sufficient conditions
- reinforcement learning
- control policies
- develop a mathematical model
- bayesian reinforcement learning
- policy iteration
- average cost
- partially observable markov decision processes
- markov decision process
- inventory level
- average reward
- lost sales
- optimal pricing
- reward function