TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed Datasets.
Yuanying CaiChuheng ZhangLi ZhaoWei ShenXuyun ZhangLei SongJiang BianTao QinTieyan LiuPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- temporal difference
- reinforcement learning algorithms
- function approximation
- temporal difference learning
- eligibility traces
- td learning
- state space
- model free
- semi supervised
- database
- policy evaluation
- optimal policy
- multi agent
- learning algorithm
- kullback leibler
- markov decision processes
- synthetic datasets
- evaluation function
- transfer learning
- policy iteration
- monte carlo
- benchmark datasets
- action selection
- information theoretic
- action space
- markov chain
- machine learning
- real time