Offline Reinforcement Learning with Closed-Form Policy Improvement Operators.
Jiachen LiEdwin ZhangMing YinQinxun BaiYu-Xiang WangWilliam Yang WangPublished in: CoRR (2022)
Keyphrases
- closed form
- reinforcement learning
- optimal policy
- policy search
- iterative procedure
- action selection
- state space
- control policy
- markov decision process
- action space
- reward function
- function approximators
- closed form solutions
- temporal difference
- learning algorithm
- markov decision processes
- reinforcement learning algorithms
- model free
- function approximation
- closed form expressions
- partially observable markov decision processes
- point correspondences
- dynamic programming
- policy gradient
- maximum likelihood estimates
- cost function
- natural image matting