Off-Policy Interval Estimation with Lipschitz Value Iteration.
Ziyang TangYihao FengNa ZhangJian PengQiang LiuPublished in: NeurIPS (2020)
Keyphrases
- interval estimation
- markov decision processes
- policy iteration
- state space
- optimal policy
- dynamic programming
- finite state
- reinforcement learning
- pointwise
- infinite horizon
- average reward
- databases
- partially observable
- metric space
- factored mdps
- data sets
- average cost
- markov decision process
- lower bound
- reward function
- image sequences
- knowledge base
- genetic algorithm
- markov decision chains