Conformal Off-Policy Evaluation in Markov Decision Processes.
Daniele FoffanoAlessio RussoAlexandre ProutièrePublished in: CDC (2023)
Keyphrases
- policy evaluation
- markov decision processes
- policy iteration
- state space
- optimal policy
- reinforcement learning
- finite state
- dynamic programming
- infinite horizon
- average reward
- reinforcement learning algorithms
- decision processes
- average cost
- planning under uncertainty
- partially observable
- partially observable markov decision processes
- temporal difference
- markov decision process
- reward function
- linear programming
- least squares