Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning.
Zhendong WangJonathan J. HuntMingyuan ZhouPublished in: ICLR (2023)
Keyphrases
- optimal policy
- reinforcement learning
- policy search
- markov decision process
- control policies
- markov decision processes
- control policy
- reward function
- markov decision problems
- state space
- policy gradient methods
- policy iteration algorithm
- finite horizon
- reinforcement learning algorithms
- decision problems
- partially observable markov decision processes
- policy iteration
- dynamic programming
- infinite horizon
- approximate policy iteration
- state dependent
- average reward
- total reward
- temporal difference
- partially observable environments
- revenue management
- function approximation
- finite state
- long run
- action selection
- state action
- decision processes
- continuous state
- diffusion process
- management policies
- multi agent
- learning algorithm
- partially observable
- dynamical systems
- natural actor critic