Certified Policy Verification and Synthesis for MDPs under Distributional Reach-avoidance Properties.
S. AkshayKrishnendu ChatterjeeTobias MeggendorferDorde ZikelicPublished in: CoRR (2024)
Keyphrases
- optimal policy
- markov decision processes
- markov decision process
- reinforcement learning
- markov decision problems
- policy search
- policy iteration
- co occurrence
- state space
- state and action spaces
- average reward
- partially observable
- reward function
- infinite horizon
- average cost
- partially observable markov decision processes
- finite horizon
- decision processes
- dynamic pricing
- finite state
- program synthesis
- model checking
- factored mdps
- heuristic search