Off-Policy Interval Estimation with Lipschitz Value Iteration.
Ziyang TangYihao FengNa ZhangJian PengQiang LiuPublished in: CoRR (2020)
Keyphrases
- interval estimation
- markov decision processes
- finite state
- state space
- dynamic programming
- policy iteration
- optimal policy
- pointwise
- reinforcement learning
- markov decision process
- collaborative filtering
- factored mdps
- average reward
- average cost
- markov decision chains
- hilbert space
- partially observable
- reward function
- infinite horizon
- linear programming
- social networks
- data sets
- least squares
- special case
- search algorithm
- neural network
- databases
- stochastic shortest path