Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning.
Linjiajie FangRuoxue LiuJing ZhangWenjia WangBing-Yi JingPublished in: CoRR (2024)
Keyphrases
- actor critic
- policy iteration
- reinforcement learning
- markov decision processes
- temporal difference
- model free
- approximate dynamic programming
- optimal policy
- optimal control
- reinforcement learning algorithms
- fixed point
- average reward
- policy gradient
- least squares
- gradient method
- neuro fuzzy
- temporal difference learning
- finite state
- state space
- function approximation
- convergence rate
- markov decision problems
- infinite horizon
- linear programming
- control problems
- markov decision process
- reinforcement learning problems
- learning algorithm
- markov random field
- step size
- evaluation function
- belief propagation
- decision problems
- monte carlo
- semi supervised
- support vector