CertRL: formalizing convergence proofs for value and policy iteration in Coq.
Koundinya VajjhaAvraham ShinnarBarry M. TragerVasily PestunNathan FultonPublished in: CPP (2021)
Keyphrases
- policy iteration
- stochastic approximation
- convergence rate
- markov decision processes
- theorem prover
- model free
- reinforcement learning
- fixed point
- optimal policy
- least squares
- temporal difference
- sample path
- convergence speed
- markov decision process
- finite state
- markov decision problems
- average reward
- monte carlo
- policy evaluation
- state space
- infinite horizon
- optimal control
- discounted reward
- multistage
- step size
- linear programming
- multi agent
- stationary points
- machine learning
- actor critic