Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies.
Xinyun ChenLu WangYizhe HangHeng GeHongyuan ZhaPublished in: CoRR (2019)
Keyphrases
- infinite horizon
- optimal policy
- policy evaluation
- policy iteration
- markov decision processes
- finite horizon
- markov decision process
- partially observable markov decision processes
- markov decision problems
- decision problems
- dynamic programming
- reinforcement learning
- stochastic demand
- multistage
- average cost
- single item
- long run
- state space
- optimal control
- monte carlo
- partially observable
- lead time
- holding cost
- lost sales
- finite state
- search algorithm