First-order Policy Optimization for Robust Markov Decision Process.
Yan LiTuo ZhaoGuanghui LanPublished in: CoRR (2022)
Keyphrases
- markov decision process
- optimal policy
- state space
- markov decision processes
- infinite horizon
- reinforcement learning
- finite horizon
- policy iteration
- markov games
- transition matrices
- transition probabilities
- average cost
- robust optimization
- long run
- finite state
- action space
- stationary policies
- higher order
- reward function
- state action
- partial observability
- first order logic