Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning.
Zhendong WangJonathan J. HuntMingyuan ZhouPublished in: CoRR (2022)
Keyphrases
- optimal policy
- reinforcement learning
- policy search
- markov decision process
- control policies
- state space
- markov decision processes
- reward function
- markov decision problems
- decision problems
- policy iteration
- control policy
- infinite horizon
- finite horizon
- state dependent
- access control policies
- finite state
- dynamic programming
- average reward
- total reward
- approximate policy iteration
- policy gradient methods
- action selection
- function approximation
- asymptotically optimal
- diffusion process
- policy gradient
- management policies
- anisotropic diffusion
- sufficient conditions
- machine learning
- partially observable markov decision processes
- state action
- conflict resolution
- actor critic
- selective perception
- policy iteration algorithm