Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization.
Shutong DingKe HuZhenhao ZhangKan RenWeinan ZhangJingyi YuJingya WangYe ShiPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- optimal policy
- markov decision process
- policy search
- state space
- transition model
- actor critic
- action selection
- multi agent
- optimization problems
- policy gradient
- markov decision processes
- control policy
- diffusion process
- control policies
- function approximators
- action space
- policy iteration
- partially observable
- reinforcement learning problems
- partially observable environments
- machine learning
- asymptotically optimal
- function approximation
- state action
- rl algorithms
- optimization algorithm
- policy evaluation
- dynamic programming
- partially observable markov decision processes
- state and action spaces
- agent receives
- reward function
- global optimization
- image segmentation
- image processing