Offline Reinforcement Learning with On-Policy Q-Function Regularization.
Laixi ShiRobert DadashiYuejie ChiPablo Samuel CastroMatthieu GeistPublished in: ECML/PKDD (4) (2023)
Keyphrases
- reinforcement learning
- function approximators
- optimal policy
- policy gradient
- control policy
- state action
- policy search
- markov decision process
- function approximation
- action selection
- approximate dynamic programming
- action space
- real time
- reproducing kernel hilbert space
- actor critic
- reinforcement learning problems
- learning process
- optimal control
- markov decision processes
- reinforcement learning algorithms
- multi agent
- learning algorithm
- control policies
- smoothing parameter
- partially observable domains
- policy iteration
- partially observable markov decision processes
- piecewise constant
- weight vector
- temporal difference
- model free
- infinite horizon
- kernel function
- state space
- dynamic programming
- machine learning