Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies.
Xinyun ChenLu WangYizhe HangHeng GeHongyuan ZhaPublished in: ICLR (2020)
Keyphrases
- infinite horizon
- optimal policy
- policy evaluation
- policy iteration
- markov decision processes
- finite horizon
- markov decision process
- state space
- partially observable markov decision processes
- average cost
- long run
- dynamic programming
- stochastic demand
- holding cost
- reinforcement learning
- markov decision problems
- optimal control
- decision problems
- single item
- lost sales
- partially observable
- least squares
- finite state
- multistage
- average reward
- model free
- decision processes
- temporal difference
- learning algorithm
- evaluation function
- bayesian networks