CertRL: formalizing convergence proofs for value and policy iteration in Coq.

Koundinya Vajjha Avraham Shinnar Barry M. Trager Vasily Pestun Nathan Fulton

Published in: CPP (2021)

Keyphrases

policy iteration
stochastic approximation
convergence rate
markov decision processes
theorem prover
model free
reinforcement learning
fixed point
optimal policy
least squares
temporal difference
sample path
convergence speed
markov decision process
finite state
markov decision problems
average reward
monte carlo
policy evaluation
state space
infinite horizon
optimal control
discounted reward
multistage
step size
linear programming
multi agent
stationary points
machine learning
actor critic