Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning.
Ruoqi ZhangZiwei LuoJens SjölundThomas B. SchönPer MattssonPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- markov decision process
- state and action spaces
- approximate dynamic programming
- partially observable environments
- action selection
- markov decision processes
- policy iteration
- function approximators
- state action
- policy gradient
- function approximation
- control policies
- action space
- state space
- reinforcement learning algorithms
- reinforcement learning problems
- policy evaluation
- model free
- actor critic
- markov decision problems
- mutual information
- least squares
- reward function
- partially observable
- decision problems
- information theory
- inverse reinforcement learning
- control policy
- long run
- temporal difference
- information theoretic
- rl algorithms
- continuous state
- policy gradient methods
- ensemble methods
- partially observable domains
- infinite horizon
- diffusion process
- finite state
- ensemble learning
- dynamic programming
- decision trees
- transition model
- agent learns
- agent receives
- partially observable markov decision processes
- diffusion model
- machine learning
- learning problems
- learning algorithm
- continuous state spaces
- reinforcement learning methods
- average reward
- learning agent
- anisotropic diffusion
- multi agent