TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed Datasets.
Yuanying CaiChuheng ZhangLi ZhaoWei ShenXuyun ZhangLei SongJiang BianTao QinTieyan LiuPublished in: ICDM (2022)
Keyphrases
- reinforcement learning
- temporal difference
- reinforcement learning algorithms
- function approximation
- eligibility traces
- temporal difference learning
- learning algorithm
- state space
- model free
- machine learning
- action selection
- benchmark datasets
- real time
- reinforcement learning methods
- markov decision processes
- semi supervised
- transfer learning
- total variation
- training dataset
- regularization term
- td learning
- learning process
- robotic control
- database
- policy evaluation
- kullback leibler
- data sets
- neural network
- feature selection
- multi agent
- cost function
- dynamic programming
- evaluation function